Thomson Reuters Labs
Foundational machine learning research
Our mission
Foundational Machine Learning Research is the dedicated core machine-learning research division of Thomson Reuters. We focus on research and development, with a particular emphasis on advanced algorithms and training techniques for large-language models (LLMs). Using a unique combination of algorithms across the entire training spectrum, we build the world’s best legal language models.
We work collaboratively with TR Labs’ — TR’s applied research division — academic partners at world-leading research institutions and domain experts with decades of experience. We experiment, prototype, test, and deliver ideas in the pursuit of smarter and more valuable models trained on an unprecedented wealth of data and powered by state-of-the-art technical infrastructure. Through our unique institutional experience, we have access to an unparalleled number of subject-matter experts involved in data collection, testing, and evaluation of trained models.
Research
Thomson Reuters Labs has a rich history in applied research activities, focused on exploring revolutionary technology related to concrete business problems.
Publications
Improving Legal Question Answering through Structured Knowledge Representation
Gupta, Ankita, and Frank Schilder. “Improving Legal Question Answering through Structured Knowledge Representation”. In Proceedings of the First Argument Mining and Empirical Legal Research Workshop (AMELR 2025). Chicago, United States, 2025.
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Bean, Andrew M., Shengzhuang Chen, Nabeel Seedat, and Jonathan Richard Schwarz. “Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings”. In arXiv Preprint arXiv:2509.16093, 2025.
Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching
Seedat, Nabeel, and Mihaela van der Schaar. “Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching”. In International Conference on Machine Learning, 2025.
Towards Human-Guided, Data-Centric LLM Co-Pilots
Saveliev, Evgeny, Jiashuo Liu, Nabeel Seedat, Anders Boyd, and Mihaela van der Schaar. “Towards Human-Guided, Data-Centric LLM Co-Pilots”. In Journal of Data-Centric Machine Learning Research, 2025.
What’s the next Frontier for Data-Centric AI? Data Savvy Agents!
Seedat, Nabeel, Jiashuo Liu, and Mihaela van der Schaar. “What’s the next Frontier for Data-Centric AI? Data Savvy Agents!” In Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR, 2025.
ADMIRE-BayesOpt: Accelerated Data MIxture RE-Weighting for Language Models with Bayesian Optimization
Chen, Shengzhuang, Xu Ouyang, Michael Arthur Leopold Pearce, Thomas Hartvigsen, and Jonathan Richard Schwarz. “ADMIRE-BayesOpt: Accelerated Data MIxture RE-Weighting for Language Models with Bayesian Optimization”. In arXiv Preprint arXiv:2508.11551, 2025.
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts
Chen, Shengzhuang, Ying Wei, and Jonathan Richard Schwarz. “Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts”. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, 16703–17. Vienna, Austria: Association for Computational Linguistics, 2025.
Composable Interventions for Language Models
Kolbeinsson, Arinbjorn, Kyle O’Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, et al. “Composable Interventions for Language Models”. In International Conference on Learning Representations, 2024.
Color-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training
Brandfonbrener, David, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, and Sham Kakade. “Color-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training”. In Advances in Neural Information Processing Systems, 2024.
Online Adaptation of Language Models with a Memory of Amortized Contexts
Tack, Jihoon, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, and Jonathan Richard Schwarz. “Online Adaptation of Language Models with a Memory of Amortized Contexts”. In Advances in Neural Information Processing Systems, 2024.
Measuring the Groundedness of Legal Question-Answering Systems
Trautmann, Dietrich, Natalia Ostapuk, Quentin Grail, Adrian Alan Pol, Guglielmo Bonifazi, Shang Gao, and Martin Gajek. “Measuring the Groundedness of Legal Question-Answering Systems”. In Workshop on Natural Legal Language Processing at EMNLP, 2024.
Upcoming events
Check out our upcoming activity below.
December 2 – 7, 2025
Meet the Foundational ML Research Team at NeurlPS 2025
Date: December 2 – 7, 2025
Location: San Diego, California, USA
Event details: NeuraIPS event information
Meet the team
Our talented scientists, engineers, designers, and developers come from diverse backgrounds, are creative problem-solvers, and have dedicated themselves to advancing knowledge work with AI and ML.
Nabeel Seedat
Stefan Winzeck
Fangyi Yu
Jodi Gardner
Jessie Shearer
Join our Foundational Machine Learning Research team
Foundational ML is expanding across EMEA and North America at all levels of experience — be part of the movement and make an impact