February 5, 2013

Chaotic Nihilists and Semantic Idealists

Alistair Croll

There are competing views of how we should tackle an abundance of data, which I’ve referred to as big data’s “odd couple”.

One camp—made up of semantic idealists who fetishize taxonomies—is to tag and organize it all. Once we’ve marked everything and how it relates to everything else, they hope, the world will be reasonable and understandable.

The poster child for the Semantic Idealists is Wolfram Alpha, a “reasoning engine” that understands, for example, a question like “how many blue whales does the earth weigh?”—even if that question has never been asked before. But it’s completely useless until someone’s told it the weight of a whale, or the earth, or, for that matter, what weight is.

They’re wrong.

In Lewis Carroll’s Sylvie and Bruno Concluded (1893), a traveller from another planet, known only as Mein Herr, learns about Earth’s maps.

‘That’s another thing we’ve learned from your Nation,” said Mein Herr, “map-making. But we’ve carried it much further than you. What do you consider the largest map that would be really useful?”

“About six inches to the mile.”

“Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”

“Have you used it much?” I enquired.

“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”

The example underscores one of the frustrations of this semantic idealism—that to perfectly tag the world around us requires an effort that approaches the world itself.

The other partner in big data’s odd couple is the chaotic nihilist. She’s abandoned any hope of properly tagging the world, and relies on machines to find the most relevant or appropriate information. Her kind are the machine-learning data scientists who are convinced that given enough data and the right algorithm, the best results will bubble to the top.

Wolfram Alpha’s counterpart for the Algorithmic Nihilists is IBM’s Watson, a search engine that guesses at answers based on probabilities (and famously won on Jeopardy.) Watson was never guaranteed to be right, but it was really, really likely to have a good answer. It also wasn’t easily controlled: when it crawled the Urban Dictionary website, it started swearing in its responses[1], and IBM’s programmers had to excise some of its more colorful vocabulary by hand.

She’s wrong too.

The future of data is a blend of both semantics and algorithms. That’s one reason Google recently introduced a second search engine, called the Knowledge Graph, that understands queries.[3] Knowledge Graph was based on technology from Metaweb, a company it acquired in 2010, and it augments “probabilistic” algorithmic search with a structured, tagged set of relationships.

We can learn a lot about this blend by considering how accountants look at a cup of coffee. How would you file such a thing? Would you file it under coffee, or cup, or Alistair? The answer, in a physical filing system, is that it depends on how you plan to use it. If you wanted to charge people for their coffee you’d file it by name. If you wanted to compare what kinds of hot drinks people consumed, you’d file it under coffee. And if you wanted to do an inventory of dishware, you’d file it under cups.

Such things have rules. Accountants spend years learning the Generally Accepted Accounting Principles (GAAP) that govern how and where to file things. In their world, if you wanted to do the three kinds of analysis, you’d need three physical copies of the cup, to stuff into three filing cabinets. And then if you changed one cup—say, giving it a price—the other two copies would be out of date.

In a digital age, this example is nonsense. We have what Heidegger[4] would call the fundamental “thingness” of the item being filed, that “around which the properties have assembled.” And then we have those properties: Alistair; Coffee; Cup. We have tags, and we can extend them.

Accounting is still mired in the bog of atoms, rather than soaring with the flexibility of bits, and with it much of how business operates. Many of the tools we rely on today don’t embrace the power of tagging and semantics out of sheer inertia. Given modern technology—relational databases, tagging, and so on—nobody would design the General Ledger or the strictures of GAAP. Yet they persist, and they slow the progress of the semantic idealists, and of data-driven business in general.

About the Author

Alistair Croll is an entrepreneur and technology analyst. He’s worked on web performance, big data, cloud computing, and startups. In 2001, he co-founded web performance startup Coradiant, and since that time has also launched Rednod, CloudOps, Bitcurrent, Year One Labs, the Bitnorth conference, and several other early-stage companies.

Alistair is the author of three books on web performance, analytics, and IT operations. He’s also the author of the forthcoming Lean Analytics (www.leananalyticsbook.com) a book on using data to build a better business faster due out in March from O’Reilly Media. Alistair is the chair of O’Reilly’s Strata conference (www.strataconf.com), Cloud Connect, and the International Startup Festival. He lives in Montreal, Canada and tries to mitigate chronic ADD by writing about far too many things at Solve For Interesting (www.solveforinteresting.com)

[1] http://www.theatlantic.com/technology/archive/2013/01/ibms-watson-memorized-the-entire-urban-dictionary-then-his-overlords-had-to-delete-it/267047/

[2] http://www.datanami.com/wp-content/uploads/2013/02/what-i-shart.jpg

[3] http://www.google.com/insidesearch/features/search/knowledge.html

[4] From The Origin of the Work of Art.

Applications: Enterprise Analytics

Tags: big data, semantic idealists, Watson, wolfram alpha

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Chaotic Nihilists and Semantic Idealists

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Chaotic Nihilists and Semantic Idealists

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link