Artificial intelligence

Natural Language Processing (NLP) at Thomson Reuters

Natural Language Processing (NLP) focuses on designing algorithms to parse, analyze, mine, and ultimately understand and generate human language. NLP with a focus on text data, is one of our core enabling technologies given our customers’ work in information heavy segments.

Language enables us to communicate, collaborate, negotiate and socialize with each other. Language allows us to record our own experiences, how we learn from others, how we share knowledge and how we preserve and advance civilization. At Thomson Reuters, we operate in language (text) rich industries. Laws, regulations, news, disputes and business transactions are all captured in text. The amount of text is growing exponentially, and processing and acting upon it is a competitive advantage for all of our customers.

The ability to process massive amounts of text, to mine it for insights and information nuggets, to organize it, to connect it, to contrast it, to understand it and to answer questions over it, is of utmost importance for our customers and for us. This is why Natural Language Processing and Understanding (NLP/U) has been one of our core research areas for the last 20 years.

The objectives of our NLP research span our editorial processes as well as our customer-facing products. On the editorial front, the primary focus is on building tools for mining, enhancing and organizing content.

As many of our data sources are rich text collections it should not come as a surprise that we solve many of our text-related problems via commonly-used NLP techniques, such as named entity recognition and resolution, classification, and natural language generation. Recent breakthroughs in Deep Learning (DL) also enable us to utilize Language Models such as BERT (McElvain et al. 2019, Custis et al. 2019, Shaghaghian et al. 2020), in order to enhance some of our products in terms of better question answering or text classification capabilities (e.g., HighQ Contract Analysis, Legal Analytics).

Our Work:

Borna Jafarpour, Dawn Sepehr, Nicolai Pogrebnyakov. 2021. “Active Curriculum Learning.” In Proceedings of the First Workshop on Interactive Learning for Natural Language Processing, ACL 2021.

Schleith, Johannes, Nina Hristozova, Brian Chechmanek, Carolyn Bussey, and Leszek Michalak. 2021. “Noise over Fear of Missing Out.” In Mensch Und Computer 2021 - Workshopband, edited by Carolin Wienrich, Philipp Wintersberger, and Benjamin Weyers. Bonn: Gesellschaft für Informatik e.V.

Pogrebnyakov, Nicolai, and Shohreh Shaghaghian. 2021. “Predicting the Success of Domain Adaptation in Text Similarity.” In Proceedings of The 6th Workshop on Representation Learning for NLP, ACL 2021.