Natural Language Processing (NLP) at Thomson Reuters
Language enables us to communicate, collaborate, negotiate and socialize with each other. Language allows us to record our own experiences, how we learn from others, how we share knowledge and how we preserve and advance civilization. At Thomson Reuters, we operate in language (text) rich industries. Laws, regulations, news, disputes and business transactions are all captured in text. The amount of text is growing exponentially, and processing and acting upon it is a competitive advantage for all of our customers.
The ability to process massive amounts of text, to mine it for insights and information nuggets, to organize it, to connect it, to contrast it, to understand it and to answer questions over it, is of utmost importance for our customers and for us. This is why Natural Language Processing and Understanding (NLP/U) has been one of our core research areas for the last 20 years.
The objectives of our NLP research span our editorial processes as well as our customer-facing products. On the editorial front, the primary focus is on building tools for mining, enhancing and organizing content.
As many of our data sources are rich text collections it should not come as a surprise that we solve many of our text-related problems via commonly-used NLP techniques, such as named entity recognition and resolution, classification, and natural language generation. Recent breakthroughs in Deep Learning (DL) also enable us to utilize Language Models such as BERT (McElvain et al. 2019, Custis et al. 2019, Shaghaghian et al. 2020), in order to enhance some of our products in terms of better question answering or text classification capabilities (e.g., HighQ Contract Analysis, Legal Analytics).
Borna Jafarpour, Dawn Sepehr, Nicolai Pogrebnyakov. 2021. “Active Curriculum Learning.” In Proceedings of the First Workshop on Interactive Learning for Natural Language Processing, ACL 2021.
Schleith, Johannes, Nina Hristozova, Brian Chechmanek, Carolyn Bussey, and Leszek Michalak. 2021. “Noise over Fear of Missing Out.” In Mensch Und Computer 2021 - Workshopband, edited by Carolin Wienrich, Philipp Wintersberger, and Benjamin Weyers. Bonn: Gesellschaft für Informatik e.V.
Pogrebnyakov, Nicolai, and Shohreh Shaghaghian. 2021. “Predicting the Success of Domain Adaptation in Text Similarity.” In Proceedings of The 6th Workshop on Representation Learning for NLP, ACL 2021.
Related research areas
Multidisciplinary approach to the challenges we face in terms of AI adoption and building trust in our solutions. We explore concepts such as interpretability, explainability, transparency, fairness, privacy and security, and societal impact – central to our AI Principles and company purpose.
AI DevOps (ModelOps)
We are exploring methods and technologies related to the emerging domain of ModelOps. This field combines AI development and IT operations with the objective to shorten the "AI Lifecycle", provide continuous delivery, and increase the quality of what we deliver to our customers.
Information Retrieval and QA
Our customers need the right information, in the right context, and often under tight time constraints. We adopt a comprehensive approach to the information findability problem, using a combination of search technologies, recommendation systems, and navigation-based discovery.