Artificial intelligence

Text mining 

Utilizing powerful machine learning methods help us uncover important information for our customers.

Most of our data is textual data (e.g., case opinions, statutes, dockets, regulations, treaties, news) and lots of the answers professionals are looking for are hidden in there. Utilizing powerful machine learning methods help us uncover important information for our customers. So-called text mining techniques have been applied in several of our projects.

Text mining (or more broadly information extraction) encompasses the automatic extraction of valuable information from text. This ranges from named entity labeling (e.g., identifying companies in news), text classification (e.g., sentiment analysis of movie reviews) to complex event extraction (e.g., identifying outcomes of lawsuits for every party involved). In our projects, we are often faced with the challenge to identify information in textual data that lawyers or tax experts need.

For example, a lawyer in front of a particular judge would like to know how this judge ruled on summary judgment motions or dismissals in the past. It would be very valuable for them to also see whether other law firms were successful (or not) and see how often their lawsuits ended in settlements or jury verdicts in favor of their clients or whether their cases got dismissed.

The answers to these questions are hidden in short texts compiled by the court clerks in legal documents called dockets (see Figure 1 for an example excerpt). It starts with the complaints and keeps track of every activity by the parties involved as well as the judge’s actions. A detailed manual analysis of a docket could provide valuable information regarding the past behavior of the involved parties and the respective judge, but reading through hundreds of dockets would be very time-consuming. Applying machine learning and NLP capabilities to all federal dockets allowed us to collect this information for almost 8 million past dockets and also enables us to keep up with all newly closed dockets. The information is being extracted automatically and some extractions that are of low confidence identified by the machine are then manually reviewed in order to ensure the overall high quality of the resulting analytics shown in the product.

As of today, we processed almost 8 million dockets containing 152 million docket entries leading to 2.6 million outcomes. We extracted about 300,000 parties, 500,000 lawyers, 125,000 law firms and 6,700 judges from 90 million state and federal dockets combined.

Chapter Two

Motion analysis

The first part of the Litigation Analytics system focuses on motion and order detection. This part provides the data ingested into the Litigation Analytics part of Westlaw Edge that allows a user to ask a question like: how long does is take Judge John Tunheim to grant a motion for summary judgment? (see Fig. 2 for the answer chart).

The overall system deploys a mix of high-precision and high-recall rules as well as some machine learning models to increase overall recall. First, motions and orders are tagged with high-precision rules. Then motions and orders are parsed in order to extract motion type, filer, order type, decision type, and judge names. Finally, the motions and orders are chained together based on the output of a machine learning algorithm.

The output of all the motion analysis component is then ingested into the Litigation Analytics application of Westlaw Edge, demonstrating the analytics by judge, lawyer or law firm. Figure 2 shows how the analytics can be further explored by selecting different views. Users may be interested in specific motions, case types or parties. The app allows the user to explore the entire set of motions extracted from the federal court docket set.

Chapter Three

Party outcome detection

The essential steps of determining outcomes for each party in a lawsuit are bifurcated between early terminations and the final outcome of the docket. In our system, early termination outcomes are determined by a docket entry classifier on the most likely docket entry where a party was terminated. As early terminations tend to be procedurally straightforward (such as dismissal, default judgment, summary judgment, or settlement), the full scope of information needed to determine the outcome can usually always be found in a single docket entry. 

Determining the final outcome of a lawsuit requires the assembly of multiple machine learning approaches including a deep learning approach that captures the information that is encoded across multiple docket entries. We used a hierarchical Recurrent Neural Network in order to determine what the final outcome for each party was in combination with other machine learning approaches. As Figure 1 shows, the interaction in the docket can be complex and deriving the correct outcome requires taking the entire flow of events into account.

Figure 3 shows a screenshot displaying the different outcomes by Judge John Tunheim. Mining this valuable information out of textual data such as dockets is an important differentiator of our products and shows how cutting edge machine learning can make real-world impact.

Chapter Four


This is just one of multiple projects where we apply machine learning techniques for text mining applications resulting in higher-value analytics products. Connecting more and more data and identifying valuable answers in text will support more products for our professional customers in the future.

If you want to learn more about the underlying approaches, the system solutions and the machine learning techniques we have been using, check out our white papers and various publications on this topic:


Custis, T., Schilder, F., Vacek, T., McElvain, G., and Alonso, H. M. (2019). Westlaw Edge AI features demo: Keycite Overruling Risk, Litigation Analytics, and Westsearch plus. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL ’19, pages 256–257, New York, NY, USA. ACM.

McElvain, G., Sanchez, G., Matthews, S., Teo, D., Pompili, F., and Custis, T. (2019). Westsearch plus: A non-factoid question-answering system for the legal domain. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France.

Vacek, T., Song, D., Molina-Salgado, H., Teo, R., Cowling, C., and Schilder, F. (2019). Litigation analytics: Extracting and querying motions and orders from us federal courts. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 116–121, Minneapolis, Minnesota. Association for Computational Linguistics.

Vacek, T., Teo, R., Song, D., Nugent, T., Cowling, C., and Schilder, F. (2019). Litigation analytics: Case outcomes extracted from us federal court dockets. In Proceedings of the Natural Legal Lan- guage Processing Workshop 2019, pages 45–54, Minneapolis, Minnesota. Association for Computational Linguistics.

Vacek, T., Teo, R., Song, D., Nugent, T., Molina-Salgado, H., Cowling, C., and Schilder, F. (2018). Litigation Analytics: The Artificial Intelligence behind it. Thomson Reuters Center for AI and Cognitive Computing White Paper, Toronto, Canada.