‍10 / 12 / 2018


Breaking Links and Reducing Risk


You could almost hear a collective sigh of relief on May 25, 2018 when organisations crossed the GDPR finish line. The reality, however, is that the race is far from run. Optically satisfying the more obvious compliance requirements with a collection of hastily assembled processes, policies and protocols is leaving many organisations unknowingly exposed to regulatory enforcement and possible fines.

At Trūata, we talk to many data rich companies who have an implicit understanding of the value of their data and the ability of the insights gleaned from that data to guide them in making more informed business decisions.

For some companies, GDPR was about ticking a regulatory box. For others, it presented a challenge for them to find ways, after May 2018, to continue with business as usual but in a way that would comply with the GDPR. Some such businesses for example, have opted to try themselves to de-identify data so they could continue to conduct analytics, but do so on non-personal data. The intention for that latter group is spot-on – to try to find a way to still get value from their data while complying with data protection laws. The problem however is with the application of that intention, largely because most organisations have failed to grasp the extremely high threshold that has been set for data anonymization, and the inherent risks in falling short of the threshold or getting it wrong.

For those organisations that are looking to continue to derive insights and to take data-driven decisions in a legally and ethically responsible manner, it may be time to rethink their approach.

Pseudonymization -v- Anonymization

With fines for GDPR infringements beginning to emerge, as well as enforcement actions, it’s a good time for businesses to delve deeper into what the new data protection regulations really mean for the processing of personal data for the purposes of analytics.

The strategy of anonymizing data before analysing it is a good strategy as it allows a way to maximise value from data while complying with the letter and spirit of the GDPR. But as stated above, the difficulty is that the threshold to achieve anonymization is extremely high and most organisations are simply not doing enough to meet this threshold.

In particular, by trying to do everything in-house such as by developing de-identification processes to tokenise direct and indirect identifiers, or by using third party tools to do so, organisations may think they have sufficiently anonymized data and thus de-risked their exposure to fines. However, put simply, what they probably have achieved, at best, is a level of pseudonymization but not anonymization.

One of the main difficulties with this in-house approach is that if the same organisation holds the original source data and also a de-identified dataset, then the risk of re-identification of individuals will remain. In the Article 29 Working Party Opinion 05/2014 on Anonymization Techniques it states that “An effective anonymization solution prevents all parties from singling out an individual in a dataset, from linking two records within a dataset (or between two separate datasets) and from inferring any information in such dataset. Generally speaking, therefore removing directly identifying elements in itself is not enough to ensure that identification of the data subject is no longer possible”.

The risk of re-identification is not just at a theoretical level, but it exists also at a practical level. For example, even if an organisation creates Chinese walls, builds silos or implements access controls around the de-identified dataset, in reality it is extremely difficult for the business to prevent datasets inadvertently being co-mingled or linked or accessed by unauthorised staff.

Pseudonymization is a well recognised and prudent practice when it comes to managing risk but importantly pseudonymized data will still be legally considered as personal data and, as such, all the principles of the GDPR will apply, whether it’s the period for retention of the data or the obligation to only use data for the purpose for which it was originally collected or the obligation in certain circumstances to delete someone’s data if they ask to be forgotten. Anonymized data, on the other hand, is not considered as personal data and the GDPR will not apply to it, as such.

The Holy Grail of Anonymization

What we do at Trūata is different. As a first step, we reduce the risk of re-identification to an insignificant level by separating the original source data from the de-identified dataset and then purposefully break the link between the two. Our data scientists and privacy experts then, taking into account the data utility needs, will independently make decisions about which anonymization techniques to apply – decisions the customer doesn’t have input into, thus preventing the customer from being deemed the controller for the purposes of anonymization.

This leaves our customers free to perform analytics, gather insights from non-personal data and ultimately pursue business strategies based on data-driven decisions. For example, they can carry out longitudinal analysis that reveal trends over time. And they can do this with the confidence that they are fully complying with data protection laws.

With the explosion of Big Data and the rise of multiple consumer touchpoints, the onus is on every organisation to become data centric. The challenge is to do so in a responsible and ethical way that will allow an organisation to continue to innovate and extract insights from data while staying GDPR compliant and respecting individuals’ privacy.