The Challenges of Privacy Engineering in a Big Data Economy

Despite having only come to the fore in recent years, the field of privacy engineering is rapidly evolving to meet the rising demands and plethora of complex challenges presented by the big data economy. Privacy engineering is not only becoming integral to data-driven businesses because of regulatory demands and hyper-automation strategies, it is also a discipline that leads to enhanced products, increased consumer trust and bottom-line growth.

What is privacy engineering?

Privacy engineering, while still an emerging field, is a technical discipline that uses engineering principles and processes to bake privacy into the core of system infrastructures, products and business operations to ensure that privacy requirements are properly met. Beyond a sophisticated technical understanding of implementation, privacy engineering also demands a deep knowledge of commercial and legal considerations to ensure that privacy requirements translate into technical requirements that satisfy business needs.

What challenges does big data pose to privacy engineering?

In today’s digital world, the exponential growth of data that is being harnessed and stored is moving at such a rate that it is now beyond the human capability to effectively process and protect privacy without the assistance of new technologies.

The 4 V’s of big data create a number of privacy engineering challenges, and the sheer volume of data is just one aspect. Advancements in technology mean that data now comes from a variety of sources and in different forms – structured, semi-structured and unstructured. This volume and variety of data also leads to data veracity issues, whereby questions surrounding the accuracy and trustworthiness of data come to the fore. Finally, the velocity of data now impacts businesses – data ingestion, processing, storing and retrieval all have to be accounted for if insights are to be turned into actionable growth and innovation strategies, while preserving privacy.

What privacy challenges are at play?

In addition to the 4V’s of big data contributing to the myriad of privacy concerns, the changing dynamics of the digital world have also surfaced a number of other privacy challenges that are placing increasing pressures on businesses to adapt and evolve processes and strategies. And this makes privacy engineering all the more complex.

Increasing demands for protection, transparency and accountability have led to new privacy laws being introduced around the world. These laws – such as GDPR, LGPD, CCPA, CPA, CPRA, PIPL – create challenges for both businesses and privacy engineers alike. From a privacy engineering perspective, solutions need to be mapped out to satisfy the highest global regulatory standards while simultaneously anticipating the emergence of future regulations and offering an agile framework that enables businesses to work with data at speed and at a scale.

In addition to this, existing IT structures are not built on the principles of privacy-by-design, which means that privacy is – more often than not – being treated as a bolt-on to existing systems rather than plugged in as an integral component. If business and system infrastructures are not designed with privacy at the foundational core, they will be unable to adapt to future privacy and data regulations or provide the now-necessary, auditable trail of compliance. As such, privacy engineers are under pressure to establish technical frameworks where privacy controls can be implemented in any existing or new system.

On top of this, with commercial strategies now heavily reliant on data, privacy engineering is also under pressure to go beyond the preservation of privacy and enable more accurate and more targeted analytics to be delivered in a market that is building and innovating based on insights, hyper-personalization strategies and more.

The trillion-calculation complexities

Beyond the aforementioned data and privacy challenges, one key area of focus that makes privacy engineering all the more intricate is the complexity of calculations that need to be handled in order to preserve privacy while maximizing data utility.

One real-world example to gauge this complexity is geo-mapping. To conduct privacy-preserving geo-mapping, privacy engineers are tasked with mapping the GPS information of millions of people to millions of zip codes, while removing all sensitive information so that no personal details are ever exposed. For example, we may have 1.7 million zip codes, so in order to do this mapping, we have to search for each piece of GPS information amongst 1.7 million zip codes. This creates just one of many trillion calculation problems that need to be handled.

Another real-world example, which we solve through Trūata Calibrate, is a data combination problem when conducting risk assessments. For example, we may have a dataset with one hundred columns containing 100 million rows of data. In this case, a risk assessment may need to be conducted on any combination of columns; it could be one, two, three or even 100 columns. This leads to trillions of trillions of calculations () that potentially need to be performed, and those calculations are applied to up to 100 millions rows of data.

To put this in context, for a 50-column dataset that would be considered “narrow” by today’s standards, there are more possible combination of columns than there are stars in the Milky Way galaxy. It is this level of calculation complexity that privacy engineering teams now face when delivering solutions for data-centric businesses.

The scale of the data calculations requires privacy operations to be scaled. Ultimately, if you cannot measure the privacy risk in datasets of such size, then you certainly cannot protect that data.

How can privacy engineering solve these big data challenges?

By following best practice principles and design patterns, privacy engineering can ensure scalable results in a cost-effective way through the creation of highly efficient privacy-enhancing technologies. You can dive into the technical drilldown via our Big Data Privacy Engineering on-demand webinar, but here are three key takeaways for privacy engineering in the big data economy…

Look towards declarative privacy design. Compared to imperative privacy design, declarative privacy design is primarily focused on providing privacy-enhancing technology systems with the ‘what’ rather than the ‘how’. This is to say that declarative privacy design concentrates on privacy requirements, describing the desired anonymization results or privacy outcomes under certain conditions without explicitly listing data operations or steps that must be taken. Whereas with imperative privacy design, there is a need to define which privacy-enhancing technologies (PETs) or algorithms must be used or applied, how they should be configured and deployed and what processes must be applied to data to protect its privacy. With declarative privacy design, the detailed data operations are separated from the desired privacy results, which are instead composed dynamically and based on the current context to fulfil privacy requirements; therefore, the PETs system applies the most suitable operations. One key benefit that results from declarative privacy design is that it is a low-code or no-code approach, which lays the foundations for a plug-and-play implementation and ensures that new pipelines can be composed to handle emerging privacy requirements rather than having to continuously add new privacy-enhancing technologies to support them.

Introduce a privacy abstraction layer to unify data operations. For example, tokenization can be decomposed into a number of generic data frame operations, which enables privacy-enhancing technologies to provide a single and consistent data abstraction layer (DAL). This layer allows for the  use of standard APIs to enable platforms to run on different big data engines, therefore facilitating interoperability and more robust privacy operations.

Evolve to a big data tech stack by focusing on multi-cloud design challenges. For best practice, it is important to use a common computing platform across different cloud environments and common platform libraries. It is also good practice to use factory design patterns to handle different key vaults, I/Os, loggings and environment configurations. By focusing on multi-cloud design, privacy-enhancing technologies are able to move from one cloud to another with minimum changes required on the core platform side.

Tune in to the full 30-minute privacy engineering webinar with our Director of Software, Yangcheng Huang, or tap into our four-step guide to privacy engineering to learn more.