Our customers operate in an increasingly complex and information-rich environments. Finding the right information, in the right context and often under time-constraints is of utmost importance. This is why we adopt a comprehensive approach to the findability problem using a combination of search technologies, recommender systems and navigation-based discovery.
5 Key Design Principles in Information Retrieval Research
To properly address most interactions including edge cases, we follow these common principles to elevate overall capabilities and deliver a satisfying user experience while creating a flow of valuable information back to further learn and improve IR solutions.
Formally, information retrieval (IR) is the science (and engineering) of searching for information in content repositories at different levels of granularities (e.g., documents, passages, meta-data) and across different types (documents, social media, images, video, sound) both at rest and in motion (e.g., streaming data). With this in mind, our information retrieval research is a bit broader than the formal definition in that it includes recommender systems and navigation-based discovery.
At Thomson Reuters, we are a text-heavy organization and as such our information retrieval research is biased toward Natural Language search, which combines techniques from Natural Language Processing (NLP) and Information Retrieval.
Information retrieval focuses on multiple granularities
This provides the most flexible method for delivering the most relevant and current information for any particular user needed.
And across many media types
Whether it’s found in a particular chapter, video, audio segment or within a photo.
Our focus areas span a number of problems including:
- Query understanding, which aims at deciphering the intent of the query (e.g., a question, finding a document or researching a topic) using a combination of natural language processing techniques and analysis of usage data, often relying on domain specific meta-data.
- Vertical Search, which involves developing search engines that incorporate domain-specific know how, meta data and use cases to the ranking problem, often using learning to rank methodologies.
- Question Answering, which aims at developing robust question answering capabilities that are able to answer open-ended, domain-specific questions. This research includes both question answering over text and document collections as well as question answering over structured data such as databases and knowledge graphs.
- Auto-Suggest, which aims at developing tools for helping customers construct better queries and questions. Simpler forms rely on usage-logs, more advanced forms incorporate user and session contexts as well as trend data (relevant for news use cases).
- Recommender Systems, which aim to push information to the user even though the user did not explicitly ask for such information. This could be in the form of pushing an analytical article that explains a topic when the system determines this to be the user’s information need, or it could be pushing related concepts and terms of art that the user might be interested in – given the context. Our research includes both content-based as well as behavior-based recommendations, with personalization.
- Dialog Systems, which aims to enable customers to engage in limited forms of dialog with our applications. In the context of search engines, this could be allowing customer to ask ‘follow-up’ questions to refine or elaborate on their previous queries/questions.
From a technology perspective, our scientists and engineers have significant expertise in classical NLP and IR methodologies as well as more recent advances including using deep learning and language models for IR and question answering problems.
Since the release of WIN, our team of scientists and engineers published numerous scientific papers, was granted a number of patents and developed dozens of capabilities and applications that had a significant impact on Thomson Reuters and our customers. In fact, we can proudly proclaim that we fundamentally transformed how legal research is done. Example products include ResultsPlus (a large-scale, content- and behavior-based recommender system with personalization), Medical Litigator (a vertical search engine for the medical domain for lawyers) Westlaw Next and its patented WestSearch which is comprised of 13 vertical search engines each designed for a target content set, Westlaw Edge (which includes robust, open-ended question answering for the law) and Checkpoint Edge (a state of the art search engine for the tax domain).
Information retrieval and search will continue to play an important role in what we do as a team and in how we satisfy our customers' varied and often complex information needs. Directionally speaking, there is no distinction between finding and understanding, and we aim to develop experiences that accept more varied input (query, document, question, session-interactions, etc) and produces more focused output (an answer, a document, a dynamically generated report, etc).