People have been debating what “big data” means ever since the term appeared in the lexicon. Researchers at Cal Berkeley recently pepper dozens of prominent data scientists and industry leader with the question in hopes of settling the big question once and for all.
You’ve undoubtedly heard many definitions for big data over the years. For some, you have big data when it can’t be stored on a single computer, or it’s the combination of volume, velocity, and variety (some add veracity). For others, big data is a catchall for stuff generated on social media or the Internet of Things, while for others it really refers to advanced analytic techniques, like machine learning.
Whatever definition you have of big data, there’s always somebody else in the room with a completely different view. To settle the question once and for all, Jenna Dutcher, the community relations manager for the University of California Berkeley’s School of Information, sought out 43 different thought-leaders who work in 43 different rooms to see how they define “big data.” The answers provided were enlightening as well as surprising.
Some of the answers were quite simple. “Big Data is the result of collecting information at its most granular level,” said Jon Bruner, editor-at-large or O’Reilly Media, which puts on the Strata conferences.
Others were entirely straightforward. “The term big data is really only useful if it describes a quantity of data that’s so large that traditional approaches to data analysis are doomed to failure,” says John Myles White, a scientist at Facebook and author of “Machine Learning for Hackers.”
One of the best answers (you didn’t think this was going to be quantitative did you?) came from Philip Ashlock, the chief architect of Data.gov, who said: “While the use of the term is quite nebulous and is often co-opted for other purposes, I’ve understood ‘big data’ to be about analysis for data that’s really messy or where you don’t know the right questions or queries to make — analysis that can help you find patterns, anomalies, or new structures amidst otherwise chaotic or complex data points.”
Mike Cavaretta, a data scientist at Ford Motor Company, sees tales in data. “I see big data as storytelling,” he writes. “…[A]nd I like to go to the raw data because of the possibilities of things you can do with it.”
Some powered through with their own refined definitions regardless of the wider ambiguousness. “Big data is an umbrella term that means a lot of different things,” says Shashi Upadhyay, CEO and founder of Lattice Engines, “but to me, it means the possibility of doing extraordinary things using modern machine learning techniques on digital data.”
It’s all relative for Joel Gurin, author of Open Data Now. “It’s a subjective term: What seems ‘big’ today may seem modest in a few years when our analytic capacity has improved.” Gregory Piatetsky-Shapiro, the president and editor of data science news aggregator KDnuggets.com, also takes the long view: “The best definition I saw is, ‘Data is big when data size becomes part of the problem.'”
Google senior research scientist Daniel Gillick sees the “big” in big data referring to immense changes in how people make decisions. ‘Big data’ represents a cultural shift in which more and more decisions are made by algorithms with transparent logic, operating on documented immutable evidence,” he says.
The growing omnipresence of data and how it differs from actual information are core to Prakash Nanduri’s understanding of the term. “Everything we know spits out data today–not just the devices we use for computing,” says the co-founder, CEO, and president of Paxata. “We now get digital exhaust from our garage door openers to our coffee pots, and everything in between….[B]ig data is at the intersection of collecting, organizing, storing, and turning all of that raw data into truly meaningful information.”
One could envision David Leonhardt, editor of the New York Times’ The Upshot, pondering the existential aspects of big data in Plato’s Cave. “Big Data is nothing more than a tool for capturing reality—just as newspaper reporting, photography and long-form journalism are,” he says. “But it’s an exciting tool, because it holds the potential of capturing reality in some clearer and more accurate ways than we have been able to do in the past.”
But there were some skeptics in the bunch who question whether big data can uncover a better and more granular “reality. “‘Big data’ is more than one thing, but an important aspect is its use as a rhetorical device, something that can be used to deceive or mislead or overhype,” says Cathy O’Neil, the director of the Lede Program at Columbia University’s school of journalism. “It is thus vitally important that people who deploy big data models consider not just technical issues but the ethical issues as well.”
Joining O’Neil in the skeptic’s room is Deirdre Mulligan, associate professor of Cal’s School of Information. “Big data: Endless possibilities or cradle-to-grave shackles, depending upon the political, ethical, and legal choices we make.”
Of all the definitions provided, probably the one that best aligned with Datanami‘s approach was the technology-oriented one put forth by Peter Skomoroch, former principal data scientist at LinkedIn.
“Big data originally described the practice in the consumer Internet industry of applying algorithms to increasingly large amounts of disparate data to solve problems that had suboptimal solutions with smaller datasets,” Skomoroch writes. “Many features and signals can only be observed by collecting massive amounts of data (for example, the relationships across an entire social network), and would not be detected using smaller samples. Processing large datasets in this manner was often difficult, time consuming, and error prone before the advent of technologies like MapReduce and Hadoop, which ushered in a wave of related tools and applications now collectively called big data technologies.”
Whatever big data means to you, there’s no denying that it’s real and that it’s impacting the way we live. While Jenna Dutcher may not have settled the question once and for all, she has moved the debate forward on several levels. Click here to read the complete compilation.
When Big Data Becomes Too Much Data
How To Not Get Overwhelmed by Big Data
If You’re Missing Fast Data, Big Data Isn’t Working for You