Follow Datanami:
May 10, 2013

Nate Silver Warns Against Big Data Assumptions

Ian Armas Foster

While there are many uses today for big data, the general principle is thus: more data equals bigger sample sizes from which more accurate representations can be drawn. However, noted statistically inclined prognosticator Nate Silver warned at the RMS Exceedance Conference in Boston this week that an over-abundance of data can be dangerous and counter-productive if managed improperly.

Silver garnered national attention in November of last year when his statistical probability models correctly picked the winner of every state in the presidential election along with every senate race that year except for one. He essentially argued that it is easier to cherry-pick numbers and create correlations that are not actually there when more data is collected. The apparent issue with this argument is that institutions looking to garner insights from swaths of data would seem unlikely to intentionally misrepresent their data.

It can be easy to forget, however, that while ‘big data’ has been an IT buzzword for the last two years, it is still a relatively recent phenomenon, especially in companies that are used to making decisions in a gut-instinct sort of way. As noted in last month’s Big Data in Sports feature, some coaches like the collection of big data for the wrong reason: it tells them things they already knew.

As such, it is at least equally possible for a manager to stretch the data to tell them what they thought they already knew.

Silver had a couple of methods to combat these biases, the first being a different, but not drastically different, thinking process.

“Think probabilistically,” Silver said. “Think in terms of probabilities and not in terms of absolutes.”

This is what Silver did in his presidential forecast, noting things like “Obama has an X percent chance of winning Virginia.” Per his numbers, there was actually only a one-in-five chance that he would correctly predict all 50 states. This approach has two immediate benefits. The first is making prognostication impersonal (good for executives who may prefer not to be proven wrong). The second, and more useful, is recognizing and utilizing margin of error.

Another method Silver promoted was recognizing one’s biases. He noted an interesting experiment where people examined identical resumes where one was headed by a male name and the other a female name. People who self-identified as having a gender bias actually judged the resumes more fairly, according to Silver, that those who claimed that held no such bias. Recognizing a bias, for Silver, means one can consciously act against it.

“Know where you’re coming from,” Silver said in his section on bias recognition. “You are defined by your weakest link,” he continued.

These tidbits of advice may not reach the granular level that institutions delving into the data ocean are hoping for. However, it is occasionally important to take a step back and understand why the data is being collected and analyzed in the first places and the goals one hopes to reach. Not avoiding these roadblocks to data-enhanced understanding can sully those goals.

Related Articles

Visualizing the Big Data Job Market

Obama Win Reinforces New Tech Era

Big Data Big Five