GDPR: Say Goodbye to Big Data’s Wild West
It’s getting time to call those cowboys home. The days of “anything goes” in big data will come to a close next May, when the European Union’s new General Data Protection Regulation (GDPR) goes into full effect.
On May 25, 2018, the GDPR will officially become law, bringing tough new rules that mandate good data handling, transparent usage policies, and consumer-friendly privacy terms for the 743 million people living in the EU.
Any company that wants to do business with these European residents will need to comply with GDPR, or face stiff penalties that range up to 4 percent of a company’s annual revenue or €20 million, whichever is greater.
GDPR is a wakeup call for American companies to solidify best practices around their big data and data science initiatives. While American firms today must follow a mish-mash of data-handling laws for specific sectors, like healthcare and banking, there’s no single overarching law telling what they can and can’t do with data in a broad sense.
That’s exactly what GDPR does – it gives European consumers the power to control how their individual data is used. That means big changes are in order for American firms that are used to doing practically whatever they wanted with the data they collected.
Data Science Impact
The GDPR will have a major impact on data science activities, says Florian Douetteau, CEO of Dataiku, a data science platform developer with offices in France and New York. But it will likely be for the better.
“Companies will need to be more structured in terms of collecting the data, understanding which data you can use or not, and be more thorough in terms of managing the lineage of the data,” Douetteau tells Datanami. “You also must be able to assess whether the data being used is being collected properly, meaning it has a consent and is acquired in the proper manner.”
Put another way, companies will no longer be able to collect data from people for one reason, and then use it for a different reason. This one facet of GDPR is predicted to boost the fortunes of companies selling data governance tools, which are designed to track data as it flows through an organization.
Companies may also have to prove that their predictive models are not unfairly discriminating against people.
“Citizens now also have the right to question and fight decisions that affect them that have been made on a purely algorithmic basis,” Dataiku says in its recent white paper, “Five Essential Pillars of Big Data GDPR Compliance.” This poses a potential problem for organizations that embrace a “black box” style of big data development, where companies don’t know exactly why algorithms generate the answers they do.
GDPR doesn’t mean the end of data science, Douetteau says. But it will provide an incentive for companies to mature their data science programs to ensure that they don’t run afoul of the new law.
Specifically, GDPR provides an incentive for organizations to stop the practice of maintaining separate teams of people – one for extracting data, one for building predictive models, and then another group for applying the result of those models.
If your company hasn’t bothered to bring its big data folks its IT department to the same table, GDPR provides a powerful incentive to begin a deeper collaboration.
Enforcing Good Data Hygiene
In a way, GDPR will mandate that companies practice good data analytics hygiene — that the data is collected in an above-board manner and that it’s processed in a transparent way. This is ultimately a good thing, although some organizations who prefer the “anything goes” days of big data’s Wild West period will undoubtedly object to being forced to adopt best practices.
“All things considered, being more organized and maintaining the data lineage is also good for productivity on the whole, meaning a good data science team can actually show the lineage of the data,” Douetteau “A good data science team can already do that.”
Amit Walia, executive vice president and chief product officer at data integration giant Informatica, sees good and bad in the new law.
“GDPR poses many challenges, but it also has the potential to result in opportunities around the use of data in an organization,” Walia says. “Organizations must take a holistic and automated approach to governance and compliance to help maximize the potential opportunity.”
Like many vendors in the space, Informatica is jumping on the GDPR compliance bandwagon and using the looming deadline (just 312 days away!) to help it market and sell its solutions. The company has put together a GDPR package designed to help companies through process changes that must occur to achieve compliance.
If some companies were on the track to achieving this, then GDPR will provide some extra encouragement to speed up that process along. On the other hand, if a company had no plans of moving away from the “Wild West” style of big data analytics, then GDPR will likely penalize them with hefty fines that can easily scale into the millions of dollars.