Natural language processing (NLP) and machine learning (ML) at Thomson Reuters
Language enables us to communicate, collaborate, negotiate, and socialize with each other. Language allows us to record our own experiences, how we learn from others, how we share knowledge, and how we preserve and advance civilization. At Thomson Reuters, we operate in language-rich industries: laws, regulations, news, disputes, and business transactions are all captured in text. The amount of text is growing exponentially; processing and acting upon it is a competitive advantage for all our customers.
The ability to process massive amounts of text, to mine it for insights and information nuggets, to organize it, to connect it, to contrast it, to understand it, and to answer questions about it, is of utmost importance for our customers and for us. This is why the combination of NLP and natural language understanding (NLU) has been one of our core research areas for the last 20 years.
The objectives of our NLP research span our editorial processes as well as our customer-facing products. On the editorial front, the primary focus is on building tools for mining, enhancing, and organizing content. Products such as Westlaw or Practical Law may have artificial intelligence (AI) components that enable our customers to extract or retrieve information at scale.
As many of our data sources are rich text collections, it should not come as a surprise that we solve many of our text-related problems via commonly used NLP techniques, such as named entity recognition and resolution, classification, and natural language generation.
Recent breakthroughs in deep learning also enable us to utilize language models such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer 3 (GPT-3) — Custis et al. 2019, Shaghaghian et al. 2020, Song et al. 2022 — in order to enhance many of our products in terms of better question answering or text classification capabilities —such as Westlaw Precision, HighQ Contract Analysis, and Litigation Analytics — while high-quality content is ensured by our human-in-the-loop approach, always testing and verifying machine-generated content.
Tonya Custis, Frank Schilder, Thomas Vacek, Gayle McElvain, and Hector Martinez Alonso. Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL ’19, pages 256–257, Montreal (Québec), Canada, 2019. ACM.
Shohreh Shaghaghian, Luna Yue Feng, Borna Jafarpour, and Nicolai Pogrebnyakov. Customizing Contextualized Language Models for Legal Document Reviews. In 2020 IEEE International Conference on Big Data (Big Data), pages 2139–2148. IEEE, 2020.
Dezhao Song, Sally Gao, Baosheng He, and Frank Schilder. On the effectiveness of pre-trained language models for legal natural language processing: An empirical study. IEEE Access, 10:75835– 75858, 2022.
Related research areas
Multidisciplinary approach to the challenges we face in terms of AI adoption and building trust in our solutions. We explore concepts such as interpretability, explainability, transparency, fairness, privacy and security, and societal impact – central to our AI Principles and company purpose.
AI DevOps (ModelOps)
We are exploring methods and technologies related to the emerging domain of ModelOps. This field combines AI development and IT operations with the objective to shorten the "AI Lifecycle", provide continuous delivery, and increase the quality of what we deliver to our customers.
Information Retrieval and QA
Our customers need the right information, in the right context, and often under tight time constraints. We adopt a comprehensive approach to the information findability problem, using a combination of search technologies, recommendation systems, and navigation-based discovery.