Why Data Sets Need To Be Examined With X-ray Vision

DR. MAURICE COYLE, CHIEF DATA SCIENTIST AT TRŪATA

The importance of understanding data privacy risks

Back in late 2019, before Covid-19 became a global issue, data had already transformed the commercial landscape and established itself as a high value business asset. However, its associated social, ethical and commercial risks were quickly identified, and a raft of data protection legislation, such as GDPR and CCPA, was established in response.

Since then, of course, we have been living through a pandemic, and that has changed everything. In particular, it has driven many more people online, to shop and conduct numerous transactions that they no longer wish to conduct in person. This is great news for data analysts and organizations that have the capability to apply data-driven insights to their behaviors and strategies because there is now much more data readily available. For those organizations that know what to do with it and how to handle it – both legally and ethically – that data is gold dust.

But progress may not be straightforward, even for businesses with good data-handling practices in place. Because, as I have noted, with data comes huge value but also risk and responsibility. Now that the volumes of data are so large, and continually growing, the greatest risk lies with those who cannot process it in a way that meets with public and legal approval.

X-raying data to understand hidden privacy risks

With more data available for analysis, it is imperative that data owners and users have a deep understanding of the privacy risks that lie within the data they are working with. In many ways, this can be compared to a radiologist studying an x-ray of the human body to pinpoint areas that need particular attention which otherwise would go unseen. Hidden privacy risks lie in data values which on their own do not identify an individual, but when combined have the potential to do so. This risk of re-identification is not obvious and can easily be missed unless suitable privacy-enhancing technologies are applied. Having the ability to automate risk assessment and apply it at scale is opening new opportunities for companies who want to implement forward-thinking data strategies that utilize data in a responsible and ethical way.

Data? Whose data?

While most organizations appreciate the value of personal data, it seems the public’s understanding is rapidly catching up. People are becoming increasingly aware of just how valuable their data is; they are beginning to view it as a commodity that they own. The public are also increasingly conscious of the potential for data breaches after high-profile cases have appeared in the media; they are becoming increasingly aware of their rights around data that is collected about them. Because of this, organizations of all sizes now have much to gain, but potentially more to lose, from the growing volume of data in circulation. The risk is both financial and reputational, and comes from various sources, not all of which are immediately obvious.

First, let us look at the obvious source. Much has been written on the subject of data protection penalties, so there is no need to delve further than reiterating that breaching them generally leads to serious reputational and financial penalties and should be avoided at all costs. And with the fundamentals of GDPR being rapidly emulated in data protection regulations worldwide, the likelihood of being penalized for the mishandling of data is increasing.

A second source of risk for organizations comes in the form of abandonment by consumers, many of whom will desert an organization that is not careful with their data, even if there is no explicit legal breach. This danger is exacerbated by the prevalence of social media: if your firm upsets people with its data processing, it can take mere seconds for that news to circulate globally. Such reputational damage is one of the less obvious risks, but gaining a reputation for lackadaisical data handling will severely undermine any brand.

It’s important that you don’t just promise your customers that you will care for their data. Prove it. Tell them how you will assure their privacy and security. Show them that you have more than a surface-level understanding of the privacy risks within your data and that you have looked deep within it to highlight hidden risks.

This brings us to the heart of the matter – privacy-enhancing technologies.

Privacy-enhancing technologies: the key to data processing

The term ‘privacy-enhancing technologies’, or PETs, refers to a wide range of approaches intended to ensure the correct handling of data in line with the privacy concerns and rights of the data subject.

Data de-identification provides a great example of how PETs are valuable. Most, if not all, sensitive data can ultimately be traced back to the source individual(s), and this is clearly a data protection risk. As such, many companies de-identify their data by removing direct identifiers such as name and address and think that this is sufficient, that the job is now done.

Only it isn’t.

There are many attributes within a data set that can identify an individual, either by themselves or combined with other data. Duplicate or related data can lurk in systems and the potential to combine values such as age, regional location and gender all present privacy risks. To address this, organizations need a solution that quantifies the potential risk of re-identification, so the ability to identify an individual can be either reduced or eliminated. This will then give companies the confidence they need to safely use the data to better serve their customers and unlock commercial value, all while safeguarding consumer privacy.

We all know the value in consumer data today, so why would a data owner take a broad-brush approach by either deciding not to interrogate a data set as they are uncertain of the risks involved or by removing so much data from a data set that it is rendered useless for analytical purposes? Especially if they can identify and remove the risks with laser-like precision? It’s likely they could be removing safe, insightful data because of their over-cautious approach. A radiologist wouldn’t recommend putting a patient in a full body cast to treat a broken arm – doing so would immobilize the patient for no need. In much the same way, organizations can’t afford to immobilize large, profit-generating data sets.

In summary, Covid-19 has accelerated the data age; it has swiftly increased awareness of privacy concerns in businesses and consumers alike. Proof of responsible data handling should now be at the top of every branding, IT, governance and operational strategy – but it must be backed up with demonstrable best practice and understanding of the issues involved. In other words, businesses can now view privacy-enhancing technologies like x-rays for their data, to highlight privacy risks that need to be addressed, which will ultimately empower them to do more with their data.

See the article in IT Pro Portal.