Skip to content
Legal Data & Metrics

What can data science tell us about the legal conversation surrounding data privacy?

Sally Gao  Data Scientist, Thomson Reuters Labs

· 5 minute read

Sally Gao  Data Scientist, Thomson Reuters Labs

· 5 minute read

A new Thomson Reuters Labs analysis reveals three key themes in the legal conversation around data privacy.

What can data science tell us about the legal conversation surrounding data privacy?

To investigate how data privacy is being addressed in the legal sphere, we analyzed over two thousand Practical Law documents published between 2013 and 2018 from Thomson Reuters Data Privacy Advisor. The documents span more than 100 legal jurisdictions across five continents, with the top three jurisdictions—the United States, the United Kingdom, and the European Union—representing 64 percent of the corpus. Thomson Reuters Labs partnered with the World Economic Forum to create this visualization for Our Shared Digital Future, a new report on shaping the future of our digital world.

To investigate how data privacy is being addressed in the legal sphere, Thomson Reuters Labs analyzed over two thousand practical law documents published between 2013 and 2018 from Thomson Reuters Data Privacy Advisor.
Data visualization by Thomson Reuters Labs. Click to enlarge.

The graphic presents a collection of legal texts as a set of clusters statistically distinguished by vocabulary frequency. This was achieved by using a hierarchical clustering algorithm called HDBSCAN, which discovers naturally occurring groups within a dataset. After running the clustering algorithm, each of the nine clusters was manually examined and given a title. The most frequent terms per cluster are displayed under the cluster titles, along with timelines showing the distribution of publication dates within each cluster.

By visually examining the graphic, we can learn some interesting things about the data. In general, spatially isolated clusters can be said to be topically self-contained, while clusters that appear close together tend to be related. Larger clusters contain more documents, illustrating the relative prominence of each topic.

Here are three key stories that emerge from our analysis:

1. GDPR’s big impact

Toward the bottom of the graphic, three clusters—“Data Protection and GDPR”, “European Data Legislation”, and “Data Protection Acts”—form a ring of closely related topics that represent a hotbed of activity, evidence of the close attention that European legislators have given to the issue of data protection since the early 2000s. This culminated in the General Data Protection Regulation (GDPR), the ambitious and sweeping 2016 privacy law with global impacts.

Once GDPR came into force in May 2018, the United Kingdom enacted its own implementation of the law, the Data Protection Act 2018 (DPA), repealing the prior DPA that had been in place since 1998. Other European countries passed similar laws, indicated by the recent spike in activity in the “European Data Legislation” cluster.

The influence of GDPR cannot be understated. The law’s implications go far beyond the borders of the EU, as even non-European organizations are subject to it as long as they collect personal data on citizens residing in the EU. In addition, GDPR is setting an example for legislators abroad. One example is the California Consumer Privacy Act, which goes into effect in 2020. And as regulatory activity under GDPR picks up speed, these topics will continue to receive attention in the legal domain.


A new dawn for data privacy

2018 was a watershed year for the international conversation on privacy and transparency. What GDPR’s full impact be and what comes next? Explore our special series with experts, futurists and government agency heads.


2. Big data breaches, big headaches

Most of the documents in the largest cluster, “Corporate Cybersecurity”, were published in the last few years, a sign that data breaches are making legislators increasingly anxious. This cluster addresses a number of issues, including: legislation requiring corporations to disclose cybersecurity incidents in a timely manner; policies surrounding information security policy for financial entities such as banks; and customer privacy and data security in the age of the Internet and Internet of Things (IoT).

Hacks and data breaches are becoming a familiar facet of the digital age. Following the lead of GDPR, legal experts are scrutinizing the need to establish stringent standards for corporations surrounding the safeguarding of sensitive information.

3. Safeguarding healthcare data continues to be important

Legacy topics such as “Protecting Health Data” have a longer history and tend to be more isolated, sharing less in common with the other clusters. In the United States, the Health Insurance Portability and Accountability Act (HIPAA), which controls the safeguarding of medical information, has been around since 1996.

Although protecting health data is not a new idea, it is still relevant today. Statistics show that the majority of data breaches affect medical and healthcare organizations. Due to the particularly sensitive nature of health data, special attention needs to be paid to regulating how it can be collected and used.

Privacy in the digital realm is a crucial concern in our increasingly connected society. As legislation and practice progress, these conceptual clusters will doubtless continue to evolve and change until technology and policy reach stasis.


Learn more

Explore the full report, Our Shared Digital Futures, which addresses the need for shared goals and coordinated action to shape an inclusive, sustainable, digital future.

View the other articles in this series:

More insights