Big Data Backlash: A Rights Movement Gains Steam
In the aftermath of the Cambridge Analytica data mining scandal, there’s a growing chorus of AI leaders calling for a data rights movement that would overhaul the ownership structure for digital assets. In addition to protecting people’s privacy, this movement could help to ignite a new data-driven economy, with machine learning and blockchain technologies at the core.
Facebook CEO Mark Zuckerberg this week announced that the company would overhaul how it presents privacy settings to users of its social network. That may be a case of closing the barn after the cows have left for Facebook, which is under fire for allowing Cambridge Analytica to access data on 87 million Americans that it used, in turn, to create psychographic profiles designed to bolster Donald Trump’s 2016 campaign.
The saga led Apple CEO Tim Cook to call for increased regulation of Facebook and any other business that profits from other people’s data. “I think the best regulation is no regulation, is self-regulation,” Cook told recode last week. “However, I think we’re beyond that here.”
Calls for regulation of the Web giants have increased in the big data ecosystem, particularly among the software companies building data science and machine learning tools that companies are using to mine their own data stores.
SriSatish “Sri” Ambati, the CEO and founder of data science software provider H2O.ai, says a data rights movement is brewing. “It’s a rights movement, kind of like the Civil Rights movement where we had to fight for rights, for freedom,” he tells Datanami. “I think we need to fight for rights for our data freedom, where we need to own our data, as well as the data service provider.”
More Valuable Than Cash
Data has grown so valuable because it’s the fuel that drives AI and machine learning, Ambati says. “Data to us should be very sacred or important. Data itself is more valuable than your own cash,” he says.
However, the ownership structure of data is misaligned at the movement, resulting in widespread abuses by the companies that store our personal data and use it for targeted display advertising. But what Cambridge Analytica did in 2016 is just the tip of the data iceberg. The lack of individual data rights was also felt with Equifax’s massive 2017 data breach, which exposed records of 143 million consumers, none of whom ever asked Equifax to store their personally identifiable information (PII), and who still have no ability to “opt out” of credit monitoring.
But there are other concerns around the use of data, including how colleges use it for admissions, how police departments use it to inform patrol patterns, and how banks use it to approve mortgages. People have begun receiving odd promotions for products that seem linked to conversations overheard by smart home assistants like Google Home and Amazon‘s Echo, accusations that they brush off as mere coincidences. With the coming Internet of Things (IoT) revolution, there’s a lot of room left to grow on big data’s “creepy” meter.
The fact that we’re seeing the #DeleteFacebook movement occurring less than two months before the European Union’s strict data privacy law, dubbed the General Data Protection Regulation (GDPR), goes into effect is just, of course, a coincidence. But that doesn’t mean the Web giants aren’t concerned about the level of enforcement we’ll see from the European Commission, which has the authority to fine American companies billions of dollars for widespread violations.
Indeed, the prospect of stricter data regulations in the United States has never been greater. Peter Wang, the CTO and co-founder of data science software vendor Anaconda, says data should be more tightly regulated because it’s so dangerous.
“We should really view data as being radioactive,” he tells Datanami. “It’s a source of great power, but if you have it around, it just contaminates everything. So you really don’t want to be holding onto data. You want it for as little time as possible to do the compute you care about. You want as little as possible — exactly the right amount — then you lock it behind lead doors and don’t touch it.”
Data also decays with time, the same way elements like plutonium have half-lives. Data harvested today is more radioactive – and more valuable – than data that was harvested 18 months ago. However, data also has a long memory, which could thwart attempts to prevent the spread of damaging facts once they’re released.
“You can’t ever put the genie back in the bottle,” Wang says. “You never know at what point in time in the future somebody is going to figure out a clever way to de-anonymize what you did by convolving it with some other data set over here. So it’s not just the latent capability of your data set — it’s the mutual neutron chain reaction when your blob of plutonium and this other blob of plutonium are put together. You never know what’s going to come out.”
Wang, however, doesn’t sound too hopeful that we’re on the cusp of new regulations that will put an end to data abuses. “I’ll believe it when I see it,” he says. When we do finally get there, he envisions it will be constructed in much the same way that IT security experts have been protecting sensitive assets for decades – by encircling it with rings of increasingly strict authentication and access controls.
“Access to full fidelity data sets are going to be considered ring zero level access,” Wang says. “What people will start doing is shipping either down-sampled subsets, or anonymized data. They’ll be shipping out inferential models that are of lower and lower quality – and lower and lower fidelity – and you only ship the models around. You don’t schlep data around because data itself is too powerful. There’s too much risk around it.”
New Markets for Data
Some of Wang’s ideas jibe with Ambati’s view. But instead of locking the data away in a bunker somewhere, Ambati envisions every person having their own personal data bank. And when that person wants to grant a company access to the data, she can authorize a given company to access certain pieces via an API. As part of this transaction, she benefits by receiving free or discounted services.
“We just need to transfer ownership to the rightful owners,” Ambati says. “Well, the owner of this data … will choose to promote and allow access to the data the way she wants. A year’s worth of TV is free if you allow us to use the data, or your phone service will be free if you allow us to access the data. Eventually the pyramid gets turned upside down and that’s frankly what’s emerging. You see the deconstruction of Facebook and Google and others in terms of trust of sharing your data sets.”
When we’re in control of our own data – when our PII has been liberated from the tyranny of big data hoarders, so to speak – then we’ll see a flowering of new commercial services around that data and that access. Albebraix Data is attempting something like this with its Personal Secure Vault, which uses the blockchain and a new digital currency dubbed ALX to get compensated by media companies for viewing ads.
Ambati thinks capitalistic forces, if properly channeled, have the potential to dramatically scale this model and create a new platform that benefits all parties involved. “Blockchain and the new currencies have made it possible to create new economies around you as an individual and you as a unique agent,” he says. “I think that’s why AI plus blockchain and data owned on the edge will make it really powerful.”
But nobody will be monetizing their own personal data so long as the data exists solely in the servers of Web giants and big enterprises. That’s why the call for a data rights movement is potentially so powerful, because it could eliminate the ability of a small group of dominant companies to have undue influence over hundreds of millions – if not billions – of people.
While his day job is building a compelling data science platform at Anaconda, Wang thinks about the ramifications of the powerful technology he’s working to put into companies’ hands. “Empires are built on language, currency, and weapons. Power aligns along these things,” he says. “And so it was extremely naïve of us as technologists to imagine that creating a global communications platform, like Facebook or Twitter, where individual peer communications could immediately escalate to broadcast to millions. We don’t have an intuitive understanding of this. We’re not wired for the sociability of that, the social norms of it. It’s a big experiment and it’s a dangerous one.”