DataTorrent
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Webinar Powering Research with Knowledge Discovery & Data Mining

January 22, 2013

Gnip's Take on the Library of Twitter


Last week, we highlighted the Library of Congress’s effort in archiving the entire Twitter database. The project, while daunting, has the potential to be hugely useful to researchers in the humanities as they explore the intricacies of human interaction—such as the evolution of journalism and social uprisings—in the social media age.

The report released by the LOC mentioned social media data mining company Gnip, which is responsible for handling and delivering the Twitter data. Chris Moody, President and COO of Gnip, spoke to Datanami on the challenges and the prospects going forward of the Twitter-LOC project.

“The Library of Congress initiative aligns very much with what we’re trying to do,” Moody said. “Gnip was formed on the idea that we believe social data has unlimited value and near-limitless applications.”

While the effort to archive Twitter’s databases in the Library of Congress gains notoriety because of the name recognition, Moody noted that the technological challenges in setting up the LOC’s archive are not much different from what they do for their private financiers.

The biggest difference, Moody said, arises in the type of research that is being done. When Moody mentions social data’s “near-limitless applications,” he is in part referring to potential future studies on disasters, government uprisings the likes of which have already been proposed. “The Library project is particularly fun and interesting for us,” Moody said, “because it’s often the research community that’s digging into some of the more interesting, fascinating things that could potentially be done with this data, whether it’s studying disease outbreaks or disaster projects or political issues.”

One of the biggest challenges, Moody noted, was dealing with the evolving nature of Twitter. While the idea of posting sub-140 character tidbits has remained mostly the same over the last seven years, the types of metadata available along with the storage-based intricacies have changed.

“In the earliest days, pulling together the archives in some kind of form was difficult for Twitter and the library,” said Moody. “Twitter as a service has evolved over time, what a tweet looked like in 2006 was very different from what a tweet looked like in 2008 which was different from what a tweet looked like in 2010. It’s a very dynamic data set and data model.”

The development of hashtags and multiple forms of retweeting contribute to the dynamics, as popular hashtags indicate an event followed by many while retweeting details their interactions.

The tweet database from 2006-2010 comprised of 20 terabytes (when uncompressed from Gnip’s delivery) containing the text and metadata behind 21 billion tweets. Gnip continues to send the LOC Twitter’s archive on a six-month delay, with over 170 billion tweets and 133 terabytes having come in over the last couple of years.

Currently, the LOC is having trouble archiving their Twitter database, with research queries taking 24 hours to run. We covered the topic at length in last week’s feature and, despite the looming growth of Twitter, the LOC seems confident that a solution will be found once archiving and processing power meets storage power.

Moody echoed that message, noting that Gnip’s efforts on that front were well spelled out by the Library.

Related Articles

Building the Library of Twitter

The Algorithmic Magic of Trendspotting

Twitter Flies by Hadoop on Search Quest

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

ISC'14

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event