Business and government leaders are under unprecedented pressure to protect customers’ personal information. Years of high-profile data leaks have fueled consumers’ concern about the security of their data. Privacy laws have also become more stringent, evidenced by the European Union’s new General Data Protection Regulation (GDPR).
These privacy laws commonly require organizations to protect customers’ personally identifiable information (PII), such as names, contact details, and dates of birth.
At the same time, there is a new business imperative to share data with other organizations so companies can gain new insights, innovate, and better compete with data-driven disruptors.
To share data, organizations use a technique called data matching to ensure the records in each company’s database refer to the same person or thing.
But how do you share and match data while protecting PII?
The Two Extremes
Organizations currently handle this problem in different ways. Some just seem to hope for the best, sharing data with other businesses without any data protection—other than perhaps a legal contract. Facebook’s Cambridge Analytica scandal is a high-profile example of the consequences of not adequately protecting PII or having a mechanism to enforce the terms under which business partners use this data.
At the other extreme end of the spectrum, some organizations avoid data sharing altogether. This may minimize the risks of data leaks and non-compliance with privacy laws—as long as the company also has strong internal data security and governance.
However, this approach puts these businesses at a competitive disadvantage against data-rich companies.
Encryption and Other Techniques
Other organizations are prepared to exchange data but use one or more PII-protection techniques when doing so.
These techniques include various types of data encryption. For example, homomorphic encryption uses a technique that allows computation of encrypted data. This enables two organizations to match datasets based on PII, without ever seeing that customer information in each other’s datasets.
Another common practice is to use a non-reversible cryptographic technique called hashing to mask sensitive fields before sharing data. This is done using one of a range of hash algorithms.
However, hashing is susceptible to brute force and other attacks that can allow hackers to re-identify PII. Combining hashing with salting is a lot more secure, as it applies a random alphanumeric string to each field before hashing.
Distributed ledger technology such as blockchain is another option, allowing organizations to securely share their datasets without having to centralize data.
However, each of these solutions has its drawbacks. When salting and hashing, for example, there’s still a risk of re-identification if a hacker gains access to the salt values.
Most of the above techniques also require sensitive data to be kept in a central ‘honeypot,’ which presents a tempting target for hackers.
These techniques are often used in bespoke do-it-yourself solutions, which can be complex and time-consuming to manage. In addition, DIY solutions are not easily repeatable or scalable—and businesses lose the ability to control or even see how their data is used after it’s shared.
DIY solutions also typically only offer one blunt data-governance instrument: Securing the data itself by obfuscating fields or aggregating records before sharing. And that generally makes data verification more difficult and data analysis less useful.
The Senate Matching Difference
Data Republic’s Senate Matching technology is designed to address the shortfalls of other data-matching techniques and services. It takes a ‘privacy by design’ approach, using a combination of data-protection techniques and a unique, decentralized architecture.
Before uploading a dataset for exchange, Senate Matching first divides the data into the fields that are not to be shared (the PII) and fields that can be shared (such as attribute data). This division means that the companies you exchange data with (and Data Republic) never have access to the PII in your database.
Datasets are anonymized using tokens, and PII is protected using salting and hashing. But Senate Matching goes further by dividing the hashed PII and distributing the ‘slices’ on a network of highly secured ‘nodes.’
Senate Matching then uses a sophisticated technique to accurately match the individual records of the datasets. So data analysts can be confident in the quality of the matching and the data itself. And because the hashed PII is sliced and distributed, it can’t be re-identified.
Other specialist data-exchange services use advanced techniques such as homomorphic encryption. However, unlike these other services, Senate Matching’s decentralized architecture ensures there’s no honeypot to tempt hackers.
Multiple Governance Controls
Senate Matching also integrates with Data Republic’s Senate data-exchange platform to provide a flexible but tightly governed and secure environment for matching and sharing data. For example, when a matched dataset is distributed to collaborators, Senate Matching uses temporary tokens—not the original tokens, which are only available to the organization that owns the data. This ensures organizations can maintain control over how their data is used.
The shared dataset can then be further protected with Senate’s governance controls, such as data auditing. In addition, data can be restricted so that analysis is only performed in highly secure, quarantined virtual machines on the Senate platform.
Senate’s multiple controls allow businesses to better manage risk without relying solely on securing the data itself. This maximizes the utility of the shared data asset, increasing the potential to unlock new insights.
For more details on how Senate Matching works, download our white paper.