Univa
Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report
Rogue Wave

January 28, 2013

How Data Analytics is Shifting Politics


Now that the U.S. presidential campaign is over, the Democratic National Committee is starting to reveal some of the factors that led to their tech advantage.

DNC Director of Architecture Chris Wegrzyn and HP’s Chris Selland recently took in-depth look at the data operations that were steadily crunching away behind the scenes well before the polls closed on this year's election.

According to Obama Campaign Manager Jim Messina, their campaign’s data analytics operation was one of their strongest advantages over the Romney campaign. We touched on that in a previous article, noting how big data analytics sparked efficient resource allocation, particularly when it came to volunteer placement and advertisement purchasing.

Unlike companies, who can plan their big data operations around long-term viability and profitability, the presidential campaign had to ensure their systems were fully operational quickly. Wegrzyn noted that they had to figure out how to both raise and spend a billion dollars in the most efficient manner over the course of 18 months. Meanwhile, a company that invests in a Hadoop cluster or something like HP Vertica could spend months on running research tests before fully implementing it on an enterprise scale.

Options such as Hadoop were attractive but ultimately not what they were looking for. “We used [Hadoop], we loved it, but it wasn’t going to be this central analyst platform,” Wegrzyn said in describing the process behind selecting a data management system. The DNC wanted to play to a strength: the campaign had a decent pool of ‘smart people’ from which to draw. As such, they needed a system that could be quickly and easily learned.

Wegrzyn then turned to SQL.  “SQL databases had a simple model that people already knew or we could teach people easily, it was designed for performance to minimize tinkering for speed and it had a clear scalability path.”

Next was identifying a vendor and a system to manage their SQL databases. Again, the specific needs and resources of their campaign drove their decision. Since they were looking for quick decisions made on datasets compiled over a relatively short period of time, performance was going to be a bigger issue than data storage. As such, appliance cost models they were getting from certain vendors were incompatible. “For us, vendors that were offering appliance cost models didn’t really make a lot of sense for us. We felt like we were going to need performance before we were going to need storage.”

Vertica ended up doing well in the campaign’s proof-of-concept models, but the tipping point lay in a shared vision between the two organizations. “The one that really tipped the scale was that Vertica had this roadmap that we felt was aligned with this idea of an analyst-driven organization.”

Through Vertica, the campaign was unexpectedly able to connect their digital and field operations. This added bonus, which turned into a system they would call ‘AirWolf’ came about as a result of connecting all their databases to the Vertica framework. After the 2008 campaign, it was thought that those databases were perhaps too vast to connect. But through a tool they developed called ‘Stork,’ the analysts were able to combine the databases. “We built a tool we called Stork which basically let Vertica serve as the center for not just our analyst operations but for how we interacted with the entire campaign.”

This comprehensive integration was built atop Vertica by the analysts and engineers using it, a process facilitated by, according to Wegrzyn, the system’s relative simplicity. “We started with just raw data and we wrote a system that allowed us to move data from various different databases into our Vertica system on a regular basis…We built a platform on top of Vertica that in short was a glorified SQL runner and scheduler.”

Once they were integrated, analysts were able to alert field organizers about online registers in their respective areas, allowing the organizers to cast a net targeting those likely to volunteer.

The philosophy of the campaign, according to Wegrzyn, was that nothing was to be assumed. Assumptions generally made in campaigns based on “what makes sense” were to be eschewed in favor of data-driven conclusions. As noted in the afore-mentioned article, the campaign would make unusual ad buys (such as Walking Dead) to target specific, niche voter groups.

To figure out which programs they would buy, they would gather their data collected from unspecified vendors over Vertica. That data was heavily demographic in nature, going beyond “women ages 20-29,” as Wegrzyn put it. For example, they wanted to target young voters who were likely supportive of the president but not necessarily driven to vote. They were then able to match that with pricing models and make informed, usually-cheaper cable buys that reach the younger, arguably more apathetic populace.

As a result, they turned part of their ad campaign into an extended get-out-the-vote movement.

Again, it should be kept in mind that this type of platform was useful to people who required simplicity and the ability to run analytics with as little development time as possible. Storage was not much of an issue for a system that was to only be in use for a year. From a short-term perspective, Vertica did well for the campaign, and will likely at least serve as a model for 2016.

Related Articles

Big Data Assists Obama Win

HP Shapes Strategy Around Big Efficient Data

Adding Autonomy to Personalized Medicine

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event