Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report


February 05, 2013

Chaotic Nihilists and Semantic Idealists


There are competing views of how we should tackle an abundance of data, which I’ve referred to as big data’s “odd couple”.

One camp—made up of semantic idealists who fetishize taxonomies—is to tag and organize it all. Once we’ve marked everything and how it relates to everything else, they hope, the world will be reasonable and understandable.

The poster child for the Semantic Idealists is Wolfram Alpha, a “reasoning engine” that understands, for example, a question like “how many blue whales does the earth weigh?”—even if that question has never been asked before. But it’s completely useless until someone’s told it the weight of a whale, or the earth, or, for that matter, what weight is.

They’re wrong.

In Lewis Carroll’s Sylvie and Bruno Concluded (1893), a traveller from another planet, known only as Mein Herr, learns about Earth’s maps.

‘That’s another thing we’ve learned from your Nation,” said Mein Herr, “map-making. But we’ve carried it much further than you. What do you consider the largest map that would be really useful?”

“About six inches to the mile.”

“Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”

“Have you used it much?” I enquired.

“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”

The example underscores one of the frustrations of this semantic idealism—that to perfectly tag the world around us requires an effort that approaches the world itself.

The other partner in big data’s odd couple is the chaotic nihilist. She’s abandoned any hope of properly tagging the world, and relies on machines to find the most relevant or appropriate information. Her kind are the machine-learning data scientists who are convinced that given enough data and the right algorithm, the best results will bubble to the top.

Wolfram Alpha’s counterpart for the Algorithmic Nihilists is IBM’s Watson, a search engine that guesses at answers based on probabilities (and famously won on Jeopardy.) Watson was never guaranteed to be right, but it was really, really likely to have a good answer. It also wasn’t easily controlled: when it crawled the Urban Dictionary website, it started swearing in its responses[1], and IBM’s programmers had to excise some of its more colorful vocabulary by hand.

She’s wrong too.

The future of data is a blend of both semantics and algorithms. That’s one reason Google recently introduced a second search engine, called the Knowledge Graph, that understands queries.[3] Knowledge Graph was based on technology from Metaweb, a company it acquired in 2010, and it augments “probabilistic” algorithmic search with a structured, tagged set of relationships.

We can learn a lot about this blend by considering how accountants look at a cup of coffee. How would you file such a thing? Would you file it under coffee, or cup, or Alistair? The answer, in a physical filing system, is that it depends on how you plan to use it. If you wanted to charge people for their coffee you’d file it by name. If you wanted to compare what kinds of hot drinks people consumed, you’d file it under coffee. And if you wanted to do an inventory of dishware, you’d file it under cups.

Such things have rules. Accountants spend years learning the Generally Accepted Accounting Principles (GAAP) that govern how and where to file things. In their world, if you wanted to do the three kinds of analysis, you’d need three physical copies of the cup, to stuff into three filing cabinets. And then if you changed one cup—say, giving it a price—the other two copies would be out of date.

In a digital age, this example is nonsense. We have what Heidegger[4] would call the fundamental “thingness” of the item being filed, that “around which the properties have assembled.” And then we have those properties: Alistair; Coffee; Cup. We have tags, and we can extend them.

Accounting is still mired in the bog of atoms, rather than soaring with the flexibility of bits, and with it much of how business operates. Many of the tools we rely on today don’t embrace the power of tagging and semantics out of sheer inertia. Given modern technology—relational databases, tagging, and so on—nobody would design the General Ledger or the strictures of GAAP. Yet they persist, and they slow the progress of the semantic idealists, and of data-driven business in general.

About the Author

Alistair Croll is an entrepreneur and technology analyst. He’s worked on web performance, big data, cloud computing, and startups. In 2001, he co-founded web performance startup Coradiant, and since that time has also launched Rednod, CloudOps, Bitcurrent, Year One Labs, the Bitnorth conference, and several other early-stage companies.

Alistair is the author of three books on web performance, analytics, and IT operations. He's also the author of the forthcoming Lean Analytics (www.leananalyticsbook.com) a book on using data to build a better business faster due out in March from O’Reilly Media. Alistair is the chair of O'Reilly's Strata conference (www.strataconf.com), Cloud Connect, and the International Startup Festival. He lives in Montreal, Canada and tries to mitigate chronic ADD by writing about far too many things at Solve For Interesting (www.solveforinteresting.com)

 

[1]           http://www.theatlantic.com/technology/archive/2013/01/ibms-watson-memorized-the-entire-urban-dictionary-then-his-overlords-had-to-delete-it/267047/

[2]           http://cdn.theatlantic.com/static/mt/assets/science/what-i-shart.jpg

[3]           http://www.google.com/insidesearch/features/search/knowledge.html

[4]           From The Origin of the Work of Art.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event