The Political Intrigue of Big Social Data
Social media analytics are powering everything from big brand decisions to the monitoring of the cultural climate. Now, with the help of some key big data startups, the massive, constant rush of social data is being tapped to look for inaccuracies and trends in political races.
The value of social media in politics has certainly not been questioned in recent years as governments and individual candidates attempt to turn the social swell in their favor (or turn their masses away from the swell altogether). However, companies like SocialMatica are taking a different approach to using the one-two punch of big social data and semantic data analysis to approach political battles.
Earlier this year the startup, which launched in 2010, released in-depth social data analysis figures that identified he top political topics in the GOP race using their semantic intelligence engine.
The company was able to pin down the key subjects that were influencing the race and further, they looked at these subjects as they related to specific candidates to get a real-time view into what issues were appealing to what potential voters.
Perhaps more important, however, is what the analytics revealed about media coverage of political races—and what these sources might be missing when it comes to the “big picture” of political engagement.
According to SocialMatica CEO, Gary Hermansen, “What is most interesting here is the discrepancy between what traditional media outlets are touting as campaign hot topics and the actual topics of conversations taking place online. Despite the media hype and political buzz surrounding immigration, candidate income and even religion, online users are most concerned about topics closer to home – such as taxes and the economy.”
Hermansen continued, “I would argue that our newspapers and TV network have continued to focus on topics less pertinent to the American public – such as Newt Gingrich’s previous work with mortgage giant Freddie Mac and the details of Mitt Romney’s personal income tax filings. These outlets would benefit from focusing on topics considered compelling by people actively engaging in social media – such as taxes, the economy, military and education.”
To get behind the scenes of the company on a technical level, we caught up with the company’s CTO, Mary Harris, for a few questions about how the platform works.
Please describe your semantic data intelligence engine; what technologies power this platform and how, if at all, on a functional level, is this different from the other platforms available?
We have developed a platform for converting freeform web data into a structured relational format and created a set of proprietary tools that are optimized for performing this task as well as creating highly structured databases of vertical/subject-based content. In the SocialMatica world, a “vertical” is a set of Web resources and appropriate associated vocabularies that have been defined in collaboration with the client.
This set of resources describes a representative data set for any particular subject, such as “Automotive” or “Data Storage” or in this case the “Republican Primary Candidates”. A Vertical resource set is built and then fed to our Data Acquisition Controller System responsible for data collection. All of our tools are running in the cloud (on Rackspace) and are scaled to manage traffic and load. Included are a set of high level diagrams to help in explaining the process and the tools. The three major components of our system are: A: The Data Acquisition Controller, B: the SocialBase Builder, and C: the Data Processing Controller.
The SocialMatica Production Pipeline
The diagram below describes the high-level flow and process of the entire system. We collect web material, both various pages and API feeds, and using a three step process, convert these data items into both numerical analytics and analyzed conversations that we then display in any number of product views. We collect information on both People and Companies in any particular vertical, as well as the underlying and supporting blog/tweet/forum/linkedIn/FaceBook/News/YouTube/WebPage/etc information and build an integrated searchable relational DB.
The Data Acquisition Controller
As the diagram below shows, this subsystem is responsible for collecting various types of web pages and managing access to several data api’s. This subsystem includes a web crawler, and various types of page parsers. We do not rely on any RSS feeds to collect Blog, Forums, FaceBook, or News information. To support this functionality, we have developed a very sophisticated proprietary page scraper/parser which allows us to scrape and parse any text off of any type of page or resource.
The SocialBase Builder
The original goal of this sub-system was to build more complete data sets for our verticals. We realized early on in the process that we could build reasonably complete data records for the people and companies that we were tracking using only web-based data.
The diagram below highlights this subsystem’s function and basis:
The Data Processing Controller
This controller manages various text processing and analytic procedures which are combined to create our unified SocialMatica database.
Again, see below for a visual representation:
To answer your “how is this different” question, we have created and optimized the process and the tools to create verticals, not large general knowledge bases. As previously mentioned, all data and vocabularies are vertical specific, and all tools are optimized for this process.
How is this “semantic data intelligence” being leveraged to follow the GOP race? Please be technical in your answer—no generalities; we are looking for an answer on the algorithmic/hardware/middleware-framework level here.
We understand the influence of the people in any vertical by understanding who they talk about, who talks about them, and what topics they are talking about. We used our general vertical framework, which was designed to evaluate people and companies in a space, and modified it slightly to look at Candidates in place of companies.
We rank candidates and influencers in the space. We rank these candidates by the number of influencers that are talking about them and the rank, or importance of these influencers. We rank influencers by how much they write, how much they are commented on, how often they are mentioned, the importance of the publishing site, the number of their twitter followers and tweets, the number of twitter mentions and twitter re-tweets, the relevancy of their tweets and blog posts to this vertical, and the timeliness of their posts.
In addition to these attributes we can also use other attributes such as number of social connections, education level, etc. to modify the ranking. These attributes are weighted based on our analysis of the social space and what we believe is most important.
Does your company work with any enterprise customers in a capacity outside of marketing analytics? In other words, is your platform being leveraged for any mission-critical operations outside of marketing, say for more direct BI or other purposes?
Yes we currently using our platform in support of research and business intelligence.
August 23, 2019
- Cloudian Announces General Availability of Object Storage Solution for VMware Cloud Provider Platform
- Igneous DataDiscover Now Available on AWS Marketplace
- Robin.io and Vexata Join Forces to Help Enterprises Modernize their Data Infrastructure
August 22, 2019
- VMware Signs Definitive Agreement to Acquire Pivotal Software
- VMware Enters Definitive Agreement to Acquire Carbon Black
- AWS Announces General Availability of Amazon Forecast
- Splunk to Acquire SignalFx
- SAP Positioned as a Leader in 2019 Gartner “Magic Quadrant for Data Integration Tools”
- Spark + AI Summit Returns to Amsterdam as the Largest Data and Machine Learning Conference in Europe
- Deep Learning Enables Scientists to Identify Cancer Cells in Blood in Milliseconds
August 21, 2019
- Paxata Recognized in the Gartner Peer Insights ‘Voice of the Customer’: Data Preparation Tools
- Syncsort Acquires SQData
- Okera Adds Visual, Natural Language Policy Creation Workflow to Data Lake Security and Governance Platform
- Databricks Offers Automation Throughout the End-to-End Data and Machine Learning Lifecycle
- Capacity Unveils AI Knowledge Sharing Platform to Boost Workplace Productivity
- ArangoDB Boosts Multi-Model Database Scalability Across Distributed Environments with Release of ArangoDB 3.5
- Siren Extends Scope of Platform with AI Functionality
August 20, 2019
- H2O.ai Raises $72.5M Led by Goldman Sachs and Ping An to Democratize AI
- Tuxera and Microsoft Strengthen Collaboration Through New exFAT Licensing
- Archive Document Data Storage Acquired by OASIS Group
Most Read Features
- Is Python Strangling R to Death?
- Re-Imagining Big Data in a Post-Hadoop World
- Big Data File Formats Demystified
- What HPE Sees in MapR Technologies
- Is Hadoop Officially Dead?
- 10 Big Data Trends to Watch in 2019
- Why Knowledge Graphs Are Foundational to Artificial Intelligence
- How to Build a Better Machine Learning Pipeline
- Data Catalogs Seen as Difference Makers in Big Data
- Big Data Is Still Hard. Here’s Why
- More Features…
Most Read News In Brief
- HPE Acquires MapR
- LinkedIn Data Helps to Create the First-Ever Global Map of Labor Flow
- MapR Says It’s Close to Deal to Sell Company
- War Unfolding for Control of Elasticsearch
- Corporate Culture Continues to Stymie Data Insights
- AWS Debuts PartiQL for Query Agnosticism
- Gartner Sees AI Democratized in Latest ‘Hype Cycle’
- California’s New Data Privacy Law Takes Effect in 2020
- LinkedIn Unleashes ‘Nearline’ Data Streaming
- Microsoft Azure Looks to Secure ‘Data Estates’
- More News In Brief…
Most Read This Just In
- SAS Fulfills Pledge to Support HBCUs with Software and Partnerships
- New Graph Database Performance Benchmark Confirms Graph Databases are Ready for Solving Real-World Business Intelligence, Data Challenges
- Cray ARM-based System ‘Ookami’ to Serve as Testbed for Computational Studies at Stony Brook
- MLOps NYC19 Conference to Promote the Standardization of ML Operations
- Illumina to Share their Data Virtualization Journey at Gartner Catalyst Conference
- Report: SAS Sees 105% Growth in AI Revenue
- Accenture to Acquire Analytics8, Australian Analytics and Data Specialists
- SnapLogic Delivers AI-powered Pipeline Recommendations and Azure Databricks Support with Latest Platform Release
- USC Marshall Convenes Workshop on Fintech and Big Data
- DataRobot Launches its First AI Investment Workflow with FactSet
- More This Just In…
September 11 - September 12New York NY United States
September 23 - September 26New York United States
October 20 - October 22Charlotte NC United States
October 23 - October 24Berlin Germany