Artificial intelligence

Natural language processing (NLP) and machine learning (ML) at Thomson Reuters

Natural language processing focuses on designing algorithms to parse, analyze, mine, and ultimately understand and generate human language. We heavily build our capabilities on the latest breakthroughs in deep learning (DL) and other machine learning techniques supporting our customers’ work in information-heavy segments.

Language enables us to communicate, collaborate, negotiate, and socialize with each other. Language allows us to record our own experiences, how we learn from others, how we share knowledge, and how we preserve and advance civilization. At Thomson Reuters, we operate in language-rich industries: laws, regulations, news, disputes, and business transactions are all captured in text. The amount of text is growing exponentially; processing and acting upon it is a competitive advantage for all our customers.

The ability to process massive amounts of text, to mine it for insights and information nuggets, to organize it, to connect it, to contrast it, to understand it, and to answer questions about it, is of utmost importance for our customers and for us. This is why the combination of NLP and natural language understanding (NLU) has been one of our core research areas for the last 20 years.

The objectives of our NLP research span our editorial processes as well as our customer-facing products. On the editorial front, the primary focus is on building tools for mining, enhancing, and organizing content. Products such as Westlaw or Practical Law may have artificial intelligence (AI) components that enable our customers to extract or retrieve information at scale.

As many of our data sources are rich text collections, it should not come as a surprise that we solve many of our text-related problems via commonly used NLP techniques, such as named entity recognition and resolution, classification, and natural language generation.

Recent breakthroughs in deep learning also enable us to utilize language models such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer 3 (GPT-3) — Custis et al. 2019, Shaghaghian et al. 2020, Song et al. 2022 — in order to enhance many of our products in terms of better question answering or text classification capabilities —such as Westlaw Precision, HighQ Contract Analysis, and Litigation Analytics — while high-quality content is ensured by our human-in-the-loop approach, always testing and verifying machine-generated content.

Our work:

Tonya Custis, Frank Schilder, Thomas Vacek, Gayle McElvain, and Hector Martinez Alonso. Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL ’19, pages 256–257, Montreal (Québec), Canada, 2019. ACM.

Shohreh Shaghaghian, Luna Yue Feng, Borna Jafarpour, and Nicolai Pogrebnyakov. Customizing Contextualized Language Models for Legal Document Reviews. In 2020 IEEE International Conference on Big Data (Big Data), pages 2139–2148. IEEE, 2020.

Dezhao Song, Sally Gao, Baosheng He, and Frank Schilder. On the effectiveness of pre-trained language models for legal natural language processing: An empirical study. IEEE Access, 10:75835– 75858, 2022.

Related research areas

See more on artificial intelligence

Human-centric AI

Multidisciplinary approach to the challenges we face in terms of AI adoption and building trust in our solutions. We explore concepts such as interpretability, explainability, transparency, fairness, privacy and security, and societal impact – central to our AI Principles and company purpose.

Learn more

AI DevOps (ModelOps)

We are exploring methods and technologies related to the emerging domain of ModelOps. This field combines AI development and IT operations with the objective to shorten the "AI Lifecycle", provide continuous delivery, and increase the quality of what we deliver to our customers.

Learn more

Information Retrieval and QA

Our customers need the right information, in the right context, and often under tight time constraints. We adopt a comprehensive approach to the information findability problem, using a combination of search technologies, recommendation systems, and navigation-based discovery.

Learn more

Solutions

Law firms

Tax, audit & accounting firms

Corporations

Governments

Success stories

Products

Legal

Trade & supply

Tax, audit & accounting

Risk & fraud

Books

Recommended products

Purchase

Buy solutions

Buy books

Contact sales

Questions? We are here to help.

Resources

Insights

Events

Product training

Product communities

Developers

Highlights

Help with account management

Help & support hub

Contact us

Natural language processing (NLP) and machine learning (ML) at Thomson Reuters

Our work:

Related research areas

Human-centric AI

AI DevOps (ModelOps)

Information Retrieval and QA