The Political Intrigue of Big Social Data
Social media analytics are powering everything from big brand decisions to the monitoring of the cultural climate. Now, with the help of some key big data startups, the massive, constant rush of social data is being tapped to look for inaccuracies and trends in political races.
The value of social media in politics has certainly not been questioned in recent years as governments and individual candidates attempt to turn the social swell in their favor (or turn their masses away from the swell altogether). However, companies like SocialMatica are taking a different approach to using the one-two punch of big social data and semantic data analysis to approach political battles.
Earlier this year the startup, which launched in 2010, released in-depth social data analysis figures that identified he top political topics in the GOP race using their semantic intelligence engine.
The company was able to pin down the key subjects that were influencing the race and further, they looked at these subjects as they related to specific candidates to get a real-time view into what issues were appealing to what potential voters.
Perhaps more important, however, is what the analytics revealed about media coverage of political races—and what these sources might be missing when it comes to the “big picture” of political engagement.
According to SocialMatica CEO, Gary Hermansen, “What is most interesting here is the discrepancy between what traditional media outlets are touting as campaign hot topics and the actual topics of conversations taking place online. Despite the media hype and political buzz surrounding immigration, candidate income and even religion, online users are most concerned about topics closer to home – such as taxes and the economy.”
Hermansen continued, “I would argue that our newspapers and TV network have continued to focus on topics less pertinent to the American public – such as Newt Gingrich’s previous work with mortgage giant Freddie Mac and the details of Mitt Romney’s personal income tax filings. These outlets would benefit from focusing on topics considered compelling by people actively engaging in social media – such as taxes, the economy, military and education.”
To get behind the scenes of the company on a technical level, we caught up with the company’s CTO, Mary Harris, for a few questions about how the platform works.
Please describe your semantic data intelligence engine; what technologies power this platform and how, if at all, on a functional level, is this different from the other platforms available?
We have developed a platform for converting freeform web data into a structured relational format and created a set of proprietary tools that are optimized for performing this task as well as creating highly structured databases of vertical/subject-based content. In the SocialMatica world, a “vertical” is a set of Web resources and appropriate associated vocabularies that have been defined in collaboration with the client.
This set of resources describes a representative data set for any particular subject, such as “Automotive” or “Data Storage” or in this case the “Republican Primary Candidates”. A Vertical resource set is built and then fed to our Data Acquisition Controller System responsible for data collection. All of our tools are running in the cloud (on Rackspace) and are scaled to manage traffic and load. Included are a set of high level diagrams to help in explaining the process and the tools. The three major components of our system are: A: The Data Acquisition Controller, B: the SocialBase Builder, and C: the Data Processing Controller.
The SocialMatica Production Pipeline
The diagram below describes the high-level flow and process of the entire system. We collect web material, both various pages and API feeds, and using a three step process, convert these data items into both numerical analytics and analyzed conversations that we then display in any number of product views. We collect information on both People and Companies in any particular vertical, as well as the underlying and supporting blog/tweet/forum/linkedIn/FaceBook/News/YouTube/WebPage/etc information and build an integrated searchable relational DB.
The Data Acquisition Controller
As the diagram below shows, this subsystem is responsible for collecting various types of web pages and managing access to several data api’s. This subsystem includes a web crawler, and various types of page parsers. We do not rely on any RSS feeds to collect Blog, Forums, FaceBook, or News information. To support this functionality, we have developed a very sophisticated proprietary page scraper/parser which allows us to scrape and parse any text off of any type of page or resource.
The SocialBase Builder
The original goal of this sub-system was to build more complete data sets for our verticals. We realized early on in the process that we could build reasonably complete data records for the people and companies that we were tracking using only web-based data.
The diagram below highlights this subsystem’s function and basis:
The Data Processing Controller
This controller manages various text processing and analytic procedures which are combined to create our unified SocialMatica database.
Again, see below for a visual representation:
To answer your “how is this different” question, we have created and optimized the process and the tools to create verticals, not large general knowledge bases. As previously mentioned, all data and vocabularies are vertical specific, and all tools are optimized for this process.
How is this “semantic data intelligence” being leveraged to follow the GOP race? Please be technical in your answer—no generalities; we are looking for an answer on the algorithmic/hardware/middleware-framework level here.
We understand the influence of the people in any vertical by understanding who they talk about, who talks about them, and what topics they are talking about. We used our general vertical framework, which was designed to evaluate people and companies in a space, and modified it slightly to look at Candidates in place of companies.
We rank candidates and influencers in the space. We rank these candidates by the number of influencers that are talking about them and the rank, or importance of these influencers. We rank influencers by how much they write, how much they are commented on, how often they are mentioned, the importance of the publishing site, the number of their twitter followers and tweets, the number of twitter mentions and twitter re-tweets, the relevancy of their tweets and blog posts to this vertical, and the timeliness of their posts.
In addition to these attributes we can also use other attributes such as number of social connections, education level, etc. to modify the ranking. These attributes are weighted based on our analysis of the social space and what we believe is most important.
Does your company work with any enterprise customers in a capacity outside of marketing analytics? In other words, is your platform being leveraged for any mission-critical operations outside of marketing, say for more direct BI or other purposes?
Yes we currently using our platform in support of research and business intelligence.
January 17, 2020
- Clarivate Analytics to Acquire Decision Resources Group
- IRI Collaborates with Oracle to Integrate and Govern Data in the Cloud
- Netradyne Captures, Analyzes 1 Billion Minutes, 500 Million Miles of Video Data
- Germany Joins 1+Million Genome Initiative
January 16, 2020
- Elastic Cloud on Kubernetes 1.0 is Now Generally Available
- Red Hat Announces the General Availability of Red Hat OpenShift Container Storage 4
- American Association for Thoracic Surgery Adopts HUBzero Cloud Platform
- Domo Announces Domopalooza 2020
- VAST’s New Container Storage Interface Brings All-Flash Infrastructure to Containerized Applications
- LSTM Selects Cloudian’s HyperStore Object Storage Solution
- Twist Bioscience Selected as DNA Synthesis Provider for DNA Data Storage Project
January 15, 2020
- AtScale Launches its Adaptive Analytics 2020.1 Platform
- TIBCO’s Spotfire Solution Reveals Who Should be Inducted in the 2020 Pro Football Hall of Fame
- NOAA Releases Extended Version of 20th Century Reanalysis Project
- Wharton Research Data Services Adds S&P Global Transcripts
- NewWave Integrates Looker and Snowflake within Microsoft Azure for Government Healthcare Applications
- Splice Machine Announces 3.0 Version SQL Data Platform
January 14, 2020
- GA4GH Committee Approves 3 Standards Supporting Automation of Data Access Process
- MariaDB Announces the Availability of MariaDB Platform X4
- Ippen Digital Taps TigerGraph to Deliver Personalized, Real-Time Recommendations to Users
Most Read Features
- Big Data Predictions: What 2020 Will Bring
- 20 AI Predictions for 2020
- Big Data File Formats Demystified
- 2019: A Big Data Year in Review – Part Two
- How to Build a Better Machine Learning Pipeline
- Apache Flink Powers Cloudera's New Streaming Analytics Product
- Cloud Looms Large for Big Data in 2020
- AI Was Everywhere at CES
- 2019: A Big Data Year in Review – Part One
- How 5G Will Serve AI and Vice Versa
- More Features…
Most Read News In Brief
- Machine Learning Hits a Scaling Bump
- California's New Data Privacy Law Takes Effect in 2020
- Inside Fortnite's Massive Data Analytics Pipeline
- VMware Widens Its Kubernetes Embrace
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- Using Big Data to Count Whales from Space
- GoodData Offers Free Embedded Analytics
- War Unfolding for Control of Elasticsearch
- Tibco Eyes ‘Data Science for Ops’ with Spotfire Upgrades
- Machine Learning Market Demonstrates Solid Growth
- More News In Brief…
Most Read This Just In
- H2O.ai Inducted into Credit Suisse Disruptive Technology Recognition Program
- Sisense Announces $100M+ Funding Round at More Than $1B Valuation
- Microsoft and KKBOX Group Launch Global Strategic Partnership
- Informatica Appoints Amit Walia as Chief Executive Officer
- Cloudera Appoints Robert Bearden President and Chief Executive Officer
- Okera Delivers Real-Time Actionable Insights into Data Lakes
- CVPR 2020 Conference Now Open for Registration
- Talend Names 3 New Female Execs From SAP, Issues Preliminary Q4 Estimates
- Jewelers Mutual to Implement H2O.ai's Solutions in the Jewelry Insurance Business
- Dremio CEO Identifies Top Big Data and Analytics Predictions for 2020
- More This Just In…
January 26 - January 28Austin TX United States