Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report


September 11, 2012

One Giant Leap for Psychohistory


For science fiction buffs, many news items coming out of the world of predictive and large-scale historical analysis invoke Asimov’s concept of Psychohistory, in which probabilistic group patterns can predict major future events in history.

While big data platforms today may not be able to predict the eventual fall of Galactic Empires (although predicting revolutionary events from social data is a reality) they can generate insights based on large swaths of historical data.

In particular, Kalev Leetaru from the University of Illinois carried out a fascinating historical analysis project where he mapped “the world according to Wikipedia” SGI’s UV2, a system which the company has touted as “the world’s largest in-memory data mining system.” Leetaru explains the genesis of the project below.

Leetaru had already used similar analytics in publishing his Culturomics 2.0, where he, according to SGI, predicted the Arab Spring and the location of bin Laden’s hideout. When he was approached by SGI’s Michael Woodacre about the new UV2 system, which would apparently carry 4,000 processors and 64,000 terabytes of cached coherent shared memory, he thought immediately of Wikipedia. “Wikipedia,” Leetaru said “has become such a fundamental part of our daily life. What could we do if we made a map of this or a series of maps over time?”

So Leetaru set out to model the world according to Wikipedia’s English-Language edition. The task itself is simple to comprehend, essentially Leetaru wanted to mark down every mention of a name, date, or place found in Wikipedia.

“We used this UV2 system to pull out every geographic location across every page,” said Leetaru “every date across every page, and every connection among those, basically capturing the spatial and temporal view of history as captured by Wikipedia’s pages…We can actually see history before our own eyes.”

Of course, there are over four million entries in the English version of Wikipedia, each of which have multiple references to any given date, place, or name. If those references are the neurons of Leetaru’s project, the connections are the synapses. Leetaru had to deal with and analyze one heck of a historical neural net.

UV2’s impressive in-memory capabilities made this possible for Leetaru. “I didn’t spend hours or days writing some fancy code that was distributed memory or using any of these fancy extensions, having to worry about memory management, allocating the right buffer sizes. I just wrote a ten line Perl script in a matter of minutes and just ran it… If I had to summarize the advantage of the UV2 platform in a single sentence, I think it would be ‘Outcomes over algorithms.’”

The outcome is represented in a fascinating infographic on SGI’s Facebook page, which goes over number of date mentions per year, sentiment over time and much more. For example, the sentiment over time graph shows sharp dips around the 1860s, 1910s, and 1940s. Those dips correspond with the American Civil War (the sharpest dip, perhaps shedding some light on the American bias in English-language Wikipedia articles) and both World Wars.

There are plenty more insights to be gleaned and plenty to be extrapolated. Leetaru’s research shows that the world has become exponentially more interconnected over the last fifty years. This connectivity makes it easier to digitalize human patterns and apply data analysis to them. Perhaps Asimov’s psychohistory is not thousands of years away after all.

Related Articles

MapReduce Makes Further Inroads in Academia

In-Memory Tweaks Boost Proteomics Research

Researchers Germinate Novel Approach to Big Bio Data

A Big Data Revolution in Astrophysics

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There is 1 discussion item posted.

The Psychohistorical Equations
Submitted by MiguelD on Sep 16, 2012 @ 2:18 AM EDT


Dear Datanami Staff and Readers,

I have encountered another, axiomatized mathematical approach to a planetary-scale but otherwise Hari-Seldon-like "psychohistory", one that utilizes "qualitative data", and heuristic-algebraic supports to human intuition and creative insight, as well as "purely-quantitative", and combined, "qualo-quantitative", mathematical tools. The rendition of the "psychohistorical equations" per the first tool has been published at www.dialectics.org, and, specifically, at --

http://www.dialectics.org/dialectics/Aoristoss_Blog/Entries/2012/5/19_The_F.E.D._Psychohistorical_Equations.html



Regards,

Miguel


Post #1

 
SGI Hadoop

Sponsored Links

Sponsored Whitepapers

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

Big Data, Big Brains – Sponsored By NetApp

04/22/2013 | NetApp

Big data has proven to be one of the most promising yet challenging technologies for both government and industry. But, before IT leaders can harness the full potential of big data, there are key issues to address surrounding infrastructure, storage, personnel, and training.
MeriTalk surveyed 17 visionary big data leaders to find out what they see as the big data challenges and opportunities as well as how government can best leverage big data. Download the “Big Data, Big Brains Report”.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event