Follow Datanami:
May 2, 2018

‘Unlinkable Data Challenge’ Opens with $50k Grand Prize

(K. Irvine/NIST)

Think you have a clever way to unlock big public data for analysis without compromising citizens’ privacy? Then the National Institute of Standards and Technology (NIST) would like to hear from you for its “Unlinkable Data Challenge,” which carries a grand prize of $50,000.

Governments collect a lot of data, just as their corporate counterparts do. There are driver’s license records, tax returns, arrest records, welfare claims, health data, airline records, and much more. But instead of cutting costs and boosting profits, which are typical goals in the corporate sector, governments ostensibly aim to improve the quality of life for its citizens through things like improving contagious disease tracking, identifying patterns of violence in communities, and planning for disasters.

Despite the good intentions, all this public data carries significant risks to the privacy and security of the citizens that the data describes. We’ve seen how big corporations and social media giants like Facebook have struggled to put the data genie back in the bottle after it’s escaped. While there have been breaches of public data – like the Office of Personnel Management hack in 2015 — governmental agencies typically move forward on big data initiatives in a slower and more thoughtful manner.

Now the NIST is hoping to find a way to safely open up the large stockpile of government data without compromising privacy and security. That is the goal of its Unlinkable Data Challenge, which started taking submissions on May 1 and will run through July 26.

“This challenge is focused on proactively protecting individual privacy while allowing for data to be used by researchers for positive purposes and outcomes,” the NIST says its Challenge.Gov website. “Developments coming out of this competition would drive major advances in the practical applications of differential privacy for these organizations.”

MIT researchers demonstrated in 2015 that it’s possible to re-identify a person with as few as four pieces of linked data

The challenge is not a trivial one, as even data that ostensibly has been anonymized can be traced back to individuals using graph techniques. This was demonstrated three years ago by a group of four MIT researchers who showed it was possible to “re-identify” a person by linking together the various pieces of metadata that describe them, even if the identity fields had been deleted from credit card records. “We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90 percent of individuals,” the researchers wrote.

So-called “linkage attacks” pose a substantial risk to personally identifiable information (PII) in big public data sets, the NIST says. And with the European Union’s General Data Protection Regulation (GDPR) set to go into effect later this month, the need to protect individuals’ privacy in big data analysis projects has never been greater.

“Today’s efforts to remove PII do not provide adequate protection against linkage attacks,” NIST writes. “With the advent of ‘big data’ and technological advances in linking data, there are far too many other possible data sources related to each of us that can lead to our identity being uncovered.”

It would be a boon for public data researchers if the NIST was successful in finding a way to make data unlinkable. The Federal agency is asking participants “to create a new algorithm utilizing existing or new randomized mechanisms with a justification of how this will optimize privacy and utility across different analysis types,” the NIST writes.

Submissions will be judged across a variety of variables, including their capability to maintain privacy and utility when a variety of machine learning techniques are used for the research, including regression, classification, and clustering algorithms. They will also be judged for their capability to handle unknown research questions with unknown techniques, as well as their general level of innovation, efficiency of compute, and robustness.

For more information or to sign up for the challenge, click here.

Related Items:

Big Data Backlash: A Rights Movement Gains Steam

Will Big Metadata Rat You Out?