Information Retrieval and QA at Thomson Reuters
At Thomson Reuters, we are a text-heavy organization and as such our information retrieval research is focused on natural language search, which combines techniques from Natural Language Processing (NLP) and Information Retrieval (IR).
Formally, Information Retrieval is the science (and engineering) of searching for information in content repositories at different levels of granularities (e.g., documents, passages, meta-data) and across different types (documents, social media, images, video, sound) both at rest and in motion (e.g., streaming data). Our information retrieval research is a bit broader than the formal definition in that it includes recommender systems and navigation-based discovery.
From a technology perspective, our scientists and engineers have significant expertise in classical NLP and IR methodologies as well as more recent advances including using deep learning and language models for IR and question answering problems.
Our scientists and engineers are pioneers of IR. For example, within the legal domain we can proudly proclaim that we have fundamentally transformed how legal research is done. Example products include ResultsPlus (a large-scale, content- and behavior-based recommender system with personalization), Medical Litigator (a vertical search engine for the medical domain for lawyers), Westlaw Next and its patented WestSearch which is comprised of 13 vertical search engines each designed for a target content set, Westlaw Edge (which includes robust, open-ended question answering for the law) and Checkpoint Edge (a state of the art search engine for the tax domain).
Information retrieval and search will continue to play an important role in what we do as a team and in how we satisfy our customers' varied and often complex information needs. Directionally speaking, there is no distinction between finding and understanding, and we aim to develop experiences that accept more varied input (query, document, question, session-interactions, etc) while producing more focused output (an answer, a document, a dynamically generated report, etc).
Wenhui Liao, Sriharsha Veeramachaneni. 2010. “Unsupervised Learning for Reranking-based Patent Retrieval”. In 3rd International Workshop on Patent Information Retrieval, in 19th ACM C Conference on Information and Knowledge Management (ICKM).
Howard R. Turtle. 1994. “Natural Language vs. Boolean Query Evaluation: A Comparison of Retrieval Performance”. In Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Pages 212-220. Dublin, Ireland: Special Issue of the SIGIR Forum.
Tonya Custis, Frank Schilder, Thomas Vacek, Gayle McElvain, Hector Martinez Alonso. 2019. “Westlaw Edge AI Features Demo: KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus”. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law. Pages 256-257. Montreal (Quebec), Canada.
Gayle McElvain, George Sanchez, Sean Matthews, Don Teo, Filippo Pompili, Tonya Custis. 2019. “WestSearch Plus: A Non-factoid Question-Answering System for the Legal Domain”. In Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Pages 1361-1364. New York, NY, USA.
Dezhao Song, Frank Schilder, Charese Smiley, Chris Brew, Tom Zielund, Hiroko Bretz, Robert Martin, Chris Dale, John Duprey, Tim Miller, Johanna Harrison. 2015. “TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets”. The Semantic Web - ISWC 2015, volume 9367, pages 21-37. Springer International Publishing.
Related research areas
Multidisciplinary approach to the challenges we face in terms of AI adoption and building trust in our solutions. We explore concepts such as interpretability, explainability, transparency, fairness, privacy and security, and societal impact – central to our AI Principles and company purpose.
AI DevOps (ModelOps)
We are exploring methods and technologies related to the emerging domain of ModelOps. This field combines AI development and IT operations with the objective to shorten the "AI Lifecycle", provide continuous delivery, and increase the quality of what we deliver to our customers.
Natural Language Processing
Natural Language Processing (NLP) focuses on designing algorithms to parse, analyze, mine, and ultimately understand and generate human language. NLP with a focus on text data, is one of our core enabling technologies given our customers’ work in information heavy segments.