Follow Datanami:
July 30, 2012

Pew Points to Troubles Ahead for Big Data

Nicole Hemsoth

Admittedly, we bear the same mallets as the rest of the tech media that has been steadily beating its drums to the big data beat, but our ears are always open for a moment when we can put the parade at rest for a moment to reflect on the tune.

No matter how we crane our necks to offset the hype cycle’s bell curve, it’s fair to say that there has been a lot of general talk about the value of big data but the cries of those who worry about what it all portends are often drowned out by the din of excitement.

While there are numerous advocacy and protection groups with specific focus on consumer and constituent concerns, other important issues bubble to the surface, including the matter of over-reliance on data mining algorithms and forecasting, as just one example.

The Pew Research Center recently delved into the task of finding the good, the bad and the ugly sides of the big data conversation. The group released a study that was  compiled as part of the fifth “Future of the Internet” survey from the Pew Research Center’s Internet and American Life Project and Elon University’s Imagining the Internet Center.

While many of the respondents (which included experts in systems, communications and other areas) felt that the new sources of exploitable information (and new frameworks and platforms to allow this) could enable further insight for business and research, there were some notable reservations about the risk of swimming in such a deep sea of information.

While just over half of the respondents expressed favorable opinions of the state of data and its use in 2020, we wanted to take a look at why 38 percent of respondents weren’t quite as optimistic. This group says that new tools and algorithms aside, big data will cause will more problems than it solves by 2020. They agreed that “the existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes.” They also suggest that there is the ultimate possibility of extreme misuse of this data on the part of governments and companies and represents a “big negative for society in nearly all aspects.”

The following are some insights broken down by categories that represent the major concern areas for big data in the future as reported by Pew….

Next – Class, Conflict and SciFi Society >

 

Big Brother, Big Business

When conversations about big data analytics emerge, privacy and concerns about “big brother” are never far away.

Some experts contend that this issue is not being given appropriate weight as we plow forward into the next great data age—and that the opportunity for class conflict and segmentation, not to mention spying and other no-nos are imminent.

Brian Harvey, a lecturer at UC Berkeley claims that government use of data aside, on all fronts “The collection of information is going to benefit the rich, at the expense of the poor.”

Stephen Masiclat, an associate professor of communications at Syracuse University echoed this statement, suggesting  that as big data collection and use becomes the norm, “an increasing sector of the population will eventually be in the business of explaining big data insights to people not trained to understand the statistical mechanics and limitations of the systems.” He says that this, coupled with increasing classification-based data becoming more granular can lead to more class stratification driven by marketers and other business operations.”

Oscar Gandy, communications expert says that there needs to be a different approach to how we place the power of analytics and massive datasets. He says that “There is a need to think a bit more about the distribution of the harms that flow from the rise of big, medium and little data gatherers, brokers and users. If big data could be used primarily for social benefit, rather than the pursuit of profit (and the social-control systems that support the effort) then I could ‘sign on’ to the data-driven future and its expression through the Internet of Things.”

The power of big data in the hands of relatively few companies or government agencies is indeed troublesome, but some suggest that over-reliance on data that is not well understood, is out of context, or is just plain wrong can further complicate these probles.

Next — The Human Element >>


The Human Element

Data quality, reliability, context and even the over-reliance on data as the sole word of truth are all dangers we face going forward, according to some of the experts the Pew Research Center tapped for their study.

Marcia Richards Suelzer, senior analyst at Wolters Kluwer warns, “We can now make catastrophic miscalculations in nanoseconds and broadcast them universally. We have lost the balance inherent in lag time.”

Jerry Michalski, founder and president of Sociate and a consultant for the Institute for the Future says that all things considered, we need to keep the human angle in mind when we talk about data. He says that humans “consistently seem to think they know more than they actually know in retrospect….the best-intentioned of humans will try to use big data to solve big problems, but are unlikely to do well at it” whereas the worst of us will have at hand immensely powerful ways to do harm, from hidden manipulation of the population to all sorts of privacy invasions.”

Dan Ness, principal research analyst, MetaFacts says “A lot of big data today is biased and missing context, as it’s based on convenience samples or subsets. We’re seeing valiant, yet misguided attempts to apply the deep datasets to things that have limited relevance or applicability. They’re being stretched to answer the wrong questions.”  He argues that instead of relying on the “lamppost light” data scientists in 2020 will “develop and use the equivalent of focused flashlights.”

Additionally, there is too much data to make sense of already—and there will be more in the future. Some, including Sivasubramanian Muthusamy, president of the Internet Society India chapter in Chennai and founder of InternetStudio suggest that the emphasis needs to be on quality versus quantity. Muthusamy says, “Separating necessary data from unnecessary data will pose particular challenges and also, data analysis alone does not guarantee optimal decisions and optimal outcomes because there are several factors beyond data—a point that is prone to be missed in the quest for more and more data.”

Next — The Problems of Access >>


The Problems of Access

Alex Halavais, vice president of the Association of Internet Researchers and author of Search Engine Society says that the real power of big data can’t be unleashed until there is openness—to what degree data is held privately or made publicly available.

Halavais argues that “openly available data and widespread tools for manipulating it will create new ways of understanding and governing ourselves as individuals and societies.”

Cyprien Lomas, who directs the Learning Centre for Land and Food Systems at the University of British Columbia echoes this, saying that it leads to a necessary checks and balances approach to data. He says that “along with the rise of big data should come equal and open access to the data so that assumptions can be checked and double checked and to foster a culture of looking for results in data.” He says that access to that same data can allow for parallel queries –but of course, one might venture to say this also comes with some significant danger due to potential privacy concerns.

Sean Mead directs solutions architecture, valuation and analytics for Mead, Mead & Clark, Interbrand and says that there will need to be a strong open data and “AI liberation” movement before the promise of big data can be realized responsibly. He argues that “large, publicly available data sets, easier tools, wider distribution of analytics skills and early stage artificial intelligence software will lead to a burst of economic activity” but social movements will rise up to free access to large repositories and to restrict how such software is used.”

Next — The Evolving Ecosystem >>


The Evolving Ecosystem

Christian Huitema, a distinguished engineer with Microsoft suggests that it will take more than a mere decade to master the “extraction of actual knowledge from big data sets.” This refers to analytics and the systems designed to process them accurately and in the appropriate amount of time.

Ted M. Coopman, who teaches at San Jose State University and is on the executive committee with the Association of Internet Researchers says that while there are great possibilities, “the lack of theoretical coherency and understanding of how large and complex systems work will cause major problems to arise.” He says that being able to identify variables alone does not lead to an understanding of them—and massive complex systems are very hard to predict in cases like social or financial institutions.

Perry Hewitt who directs digital communications and services at Harvard University says that software advances as embodied by nowcasting are “sure to stumble many times before they stand and companies will control software tools in ways that will make us all profoundly and correctly suspicious.” He says that behind all the justifiable concern, however, there is hope for a better world through responsible use of data.

Datanami