Leverage Big Data
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Webinar Powering Research with Knowledge Discovery & Data Mining

September 11, 2012

One Giant Leap for Psychohistory

For science fiction buffs, many news items coming out of the world of predictive and large-scale historical analysis invoke Asimov’s concept of Psychohistory, in which probabilistic group patterns can predict major future events in history.

While big data platforms today may not be able to predict the eventual fall of Galactic Empires (although predicting revolutionary events from social data is a reality) they can generate insights based on large swaths of historical data.

In particular, Kalev Leetaru from the University of Illinois carried out a fascinating historical analysis project where he mapped “the world according to Wikipedia” SGI’s UV2, a system which the company has touted as “the world’s largest in-memory data mining system.” Leetaru explains the genesis of the project below.

Leetaru had already used similar analytics in publishing his Culturomics 2.0, where he, according to SGI, predicted the Arab Spring and the location of bin Laden’s hideout. When he was approached by SGI’s Michael Woodacre about the new UV2 system, which would apparently carry 4,000 processors and 64,000 terabytes of cached coherent shared memory, he thought immediately of Wikipedia. “Wikipedia,” Leetaru said “has become such a fundamental part of our daily life. What could we do if we made a map of this or a series of maps over time?”

So Leetaru set out to model the world according to Wikipedia’s English-Language edition. The task itself is simple to comprehend, essentially Leetaru wanted to mark down every mention of a name, date, or place found in Wikipedia.

“We used this UV2 system to pull out every geographic location across every page,” said Leetaru “every date across every page, and every connection among those, basically capturing the spatial and temporal view of history as captured by Wikipedia’s pages…We can actually see history before our own eyes.”

Of course, there are over four million entries in the English version of Wikipedia, each of which have multiple references to any given date, place, or name. If those references are the neurons of Leetaru’s project, the connections are the synapses. Leetaru had to deal with and analyze one heck of a historical neural net.

UV2’s impressive in-memory capabilities made this possible for Leetaru. “I didn’t spend hours or days writing some fancy code that was distributed memory or using any of these fancy extensions, having to worry about memory management, allocating the right buffer sizes. I just wrote a ten line Perl script in a matter of minutes and just ran it… If I had to summarize the advantage of the UV2 platform in a single sentence, I think it would be ‘Outcomes over algorithms.’”

The outcome is represented in a fascinating infographic on SGI’s Facebook page, which goes over number of date mentions per year, sentiment over time and much more. For example, the sentiment over time graph shows sharp dips around the 1860s, 1910s, and 1940s. Those dips correspond with the American Civil War (the sharpest dip, perhaps shedding some light on the American bias in English-language Wikipedia articles) and both World Wars.

There are plenty more insights to be gleaned and plenty to be extrapolated. Leetaru’s research shows that the world has become exponentially more interconnected over the last fifty years. This connectivity makes it easier to digitalize human patterns and apply data analysis to them. Perhaps Asimov’s psychohistory is not thousands of years away after all.

Related Articles

MapReduce Makes Further Inroads in Academia

In-Memory Tweaks Boost Proteomics Research

Researchers Germinate Novel Approach to Big Bio Data

A Big Data Revolution in Astrophysics

Share Options


» Subscribe to our weekly e-newsletter


There is 1 discussion item posted.

The Psychohistorical Equations
Submitted by MiguelD on Sep 16, 2012 @ 2:18 AM EDT

Dear Datanami Staff and Readers,

I have encountered another, axiomatized mathematical approach to a planetary-scale but otherwise Hari-Seldon-like "psychohistory", one that utilizes "qualitative data", and heuristic-algebraic supports to human intuition and creative insight, as well as "purely-quantitative", and combined, "qualo-quantitative", mathematical tools. The rendition of the "psychohistorical equations" per the first tool has been published at www.dialectics.org, and, specifically, at --




Post #1


Most Read Features

Most Read News

Most Read This Just In


Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014

» View/Search Events

» Post an Event