Publications
Publications
Improving Legal Question Answering through Structured Knowledge Representation
Gupta, Ankita, and Frank Schilder. “Improving Legal Question Answering through Structured Knowledge Representation”. In Proceedings of the First Argument Mining and Empirical Legal Research Workshop (AMELR 2025). Chicago, United States, 2025.
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings
Bean, Andrew M., Shengzhuang Chen, Nabeel Seedat, and Jonathan Richard Schwarz. “Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings”. In arXiv Preprint arXiv:2509.16093, 2025.
Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching
Seedat, Nabeel, and Mihaela van der Schaar. “Bootstrapping Self-Improvement of Language Model Programs for Zero-Shot Schema Matching”. In International Conference on Machine Learning, 2025.
Towards Human-Guided, Data-Centric LLM Co-Pilots
Saveliev, Evgeny, Jiashuo Liu, Nabeel Seedat, Anders Boyd, and Mihaela van der Schaar. “Towards Human-Guided, Data-Centric LLM Co-Pilots”. In Journal of Data-Centric Machine Learning Research, 2025.
What’s the next Frontier for Data-Centric AI? Data Savvy Agents!
Seedat, Nabeel, Jiashuo Liu, and Mihaela van der Schaar. “What’s the next Frontier for Data-Centric AI? Data Savvy Agents!” In Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR, 2025.
ADMIRE-BayesOpt: Accelerated Data MIxture RE-Weighting for Language Models with Bayesian Optimization
Chen, Shengzhuang, Xu Ouyang, Michael Arthur Leopold Pearce, Thomas Hartvigsen, and Jonathan Richard Schwarz. “ADMIRE-BayesOpt: Accelerated Data MIxture RE-Weighting for Language Models with Bayesian Optimization”. In arXiv Preprint arXiv:2508.11551, 2025.
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts
Chen, Shengzhuang, Ying Wei, and Jonathan Richard Schwarz. “Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts”. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, 16703–17. Vienna, Austria: Association for Computational Linguistics, 2025.
Composable Interventions for Language Models
Kolbeinsson, Arinbjorn, Kyle O’Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, et al. “Composable Interventions for Language Models”. In International Conference on Learning Representations, 2024.
Color-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training
Brandfonbrener, David, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, and Sham Kakade. “Color-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-Training”. In Advances in Neural Information Processing Systems, 2024.
Online Adaptation of Language Models with a Memory of Amortized Contexts
Tack, Jihoon, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, and Jonathan Richard Schwarz. “Online Adaptation of Language Models with a Memory of Amortized Contexts”. In Advances in Neural Information Processing Systems, 2024.
Measuring the Groundedness of Legal Question-Answering Systems
Trautmann, Dietrich, Natalia Ostapuk, Quentin Grail, Adrian Alan Pol, Guglielmo Bonifazi, Shang Gao, and Martin Gajek. “Measuring the Groundedness of Legal Question-Answering Systems”. In Workshop on Natural Legal Language Processing at EMNLP, 2024.
Join our Foundational Machine Learning Research team
Foundational ML is expanding across EMEA and North America at all levels of experience — be part of the movement and make an impact