Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


February 05, 2013

Chaotic Nihilists and Semantic Idealists


There are competing views of how we should tackle an abundance of data, which I’ve referred to as big data’s “odd couple”.

One camp—made up of semantic idealists who fetishize taxonomies—is to tag and organize it all. Once we’ve marked everything and how it relates to everything else, they hope, the world will be reasonable and understandable.

The poster child for the Semantic Idealists is Wolfram Alpha, a “reasoning engine” that understands, for example, a question like “how many blue whales does the earth weigh?”—even if that question has never been asked before. But it’s completely useless until someone’s told it the weight of a whale, or the earth, or, for that matter, what weight is.

They’re wrong.

In Lewis Carroll’s Sylvie and Bruno Concluded (1893), a traveller from another planet, known only as Mein Herr, learns about Earth’s maps.

‘That’s another thing we’ve learned from your Nation,” said Mein Herr, “map-making. But we’ve carried it much further than you. What do you consider the largest map that would be really useful?”

“About six inches to the mile.”

“Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”

“Have you used it much?” I enquired.

“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”

The example underscores one of the frustrations of this semantic idealism—that to perfectly tag the world around us requires an effort that approaches the world itself.

The other partner in big data’s odd couple is the chaotic nihilist. She’s abandoned any hope of properly tagging the world, and relies on machines to find the most relevant or appropriate information. Her kind are the machine-learning data scientists who are convinced that given enough data and the right algorithm, the best results will bubble to the top.

Wolfram Alpha’s counterpart for the Algorithmic Nihilists is IBM’s Watson, a search engine that guesses at answers based on probabilities (and famously won on Jeopardy.) Watson was never guaranteed to be right, but it was really, really likely to have a good answer. It also wasn’t easily controlled: when it crawled the Urban Dictionary website, it started swearing in its responses[1], and IBM’s programmers had to excise some of its more colorful vocabulary by hand.

She’s wrong too.

The future of data is a blend of both semantics and algorithms. That’s one reason Google recently introduced a second search engine, called the Knowledge Graph, that understands queries.[3] Knowledge Graph was based on technology from Metaweb, a company it acquired in 2010, and it augments “probabilistic” algorithmic search with a structured, tagged set of relationships.

We can learn a lot about this blend by considering how accountants look at a cup of coffee. How would you file such a thing? Would you file it under coffee, or cup, or Alistair? The answer, in a physical filing system, is that it depends on how you plan to use it. If you wanted to charge people for their coffee you’d file it by name. If you wanted to compare what kinds of hot drinks people consumed, you’d file it under coffee. And if you wanted to do an inventory of dishware, you’d file it under cups.

Such things have rules. Accountants spend years learning the Generally Accepted Accounting Principles (GAAP) that govern how and where to file things. In their world, if you wanted to do the three kinds of analysis, you’d need three physical copies of the cup, to stuff into three filing cabinets. And then if you changed one cup—say, giving it a price—the other two copies would be out of date.

In a digital age, this example is nonsense. We have what Heidegger[4] would call the fundamental “thingness” of the item being filed, that “around which the properties have assembled.” And then we have those properties: Alistair; Coffee; Cup. We have tags, and we can extend them.

Accounting is still mired in the bog of atoms, rather than soaring with the flexibility of bits, and with it much of how business operates. Many of the tools we rely on today don’t embrace the power of tagging and semantics out of sheer inertia. Given modern technology—relational databases, tagging, and so on—nobody would design the General Ledger or the strictures of GAAP. Yet they persist, and they slow the progress of the semantic idealists, and of data-driven business in general.

About the Author

Alistair Croll is an entrepreneur and technology analyst. He’s worked on web performance, big data, cloud computing, and startups. In 2001, he co-founded web performance startup Coradiant, and since that time has also launched Rednod, CloudOps, Bitcurrent, Year One Labs, the Bitnorth conference, and several other early-stage companies.

Alistair is the author of three books on web performance, analytics, and IT operations. He's also the author of the forthcoming Lean Analytics (www.leananalyticsbook.com) a book on using data to build a better business faster due out in March from O’Reilly Media. Alistair is the chair of O'Reilly's Strata conference (www.strataconf.com), Cloud Connect, and the International Startup Festival. He lives in Montreal, Canada and tries to mitigate chronic ADD by writing about far too many things at Solve For Interesting (www.solveforinteresting.com)

 

[1]           http://www.theatlantic.com/technology/archive/2013/01/ibms-watson-memorized-the-entire-urban-dictionary-then-his-overlords-had-to-delete-it/267047/

[2]           http://cdn.theatlantic.com/static/mt/assets/science/what-i-shart.jpg

[3]           http://www.google.com/insidesearch/features/search/knowledge.html

[4]           From The Origin of the Work of Art.

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

ISC'14

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event