The Political Intrigue of Big Social Data
Social media analytics are powering everything from big brand decisions to the monitoring of the cultural climate. Now, with the help of some key big data startups, the massive, constant rush of social data is being tapped to look for inaccuracies and trends in political races.
The value of social media in politics has certainly not been questioned in recent years as governments and individual candidates attempt to turn the social swell in their favor (or turn their masses away from the swell altogether). However, companies like SocialMatica are taking a different approach to using the one-two punch of big social data and semantic data analysis to approach political battles.
Earlier this year the startup, which launched in 2010, released in-depth social data analysis figures that identified he top political topics in the GOP race using their semantic intelligence engine.
The company was able to pin down the key subjects that were influencing the race and further, they looked at these subjects as they related to specific candidates to get a real-time view into what issues were appealing to what potential voters.
Perhaps more important, however, is what the analytics revealed about media coverage of political races—and what these sources might be missing when it comes to the “big picture” of political engagement.
According to SocialMatica CEO, Gary Hermansen, “What is most interesting here is the discrepancy between what traditional media outlets are touting as campaign hot topics and the actual topics of conversations taking place online. Despite the media hype and political buzz surrounding immigration, candidate income and even religion, online users are most concerned about topics closer to home – such as taxes and the economy.”
Hermansen continued, “I would argue that our newspapers and TV network have continued to focus on topics less pertinent to the American public – such as Newt Gingrich’s previous work with mortgage giant Freddie Mac and the details of Mitt Romney’s personal income tax filings. These outlets would benefit from focusing on topics considered compelling by people actively engaging in social media – such as taxes, the economy, military and education.”
To get behind the scenes of the company on a technical level, we caught up with the company’s CTO, Mary Harris, for a few questions about how the platform works.
Please describe your semantic data intelligence engine; what technologies power this platform and how, if at all, on a functional level, is this different from the other platforms available?
We have developed a platform for converting freeform web data into a structured relational format and created a set of proprietary tools that are optimized for performing this task as well as creating highly structured databases of vertical/subject-based content. In the SocialMatica world, a “vertical” is a set of Web resources and appropriate associated vocabularies that have been defined in collaboration with the client.
This set of resources describes a representative data set for any particular subject, such as “Automotive” or “Data Storage” or in this case the “Republican Primary Candidates”. A Vertical resource set is built and then fed to our Data Acquisition Controller System responsible for data collection. All of our tools are running in the cloud (on Rackspace) and are scaled to manage traffic and load. Included are a set of high level diagrams to help in explaining the process and the tools. The three major components of our system are: A: The Data Acquisition Controller, B: the SocialBase Builder, and C: the Data Processing Controller.
The SocialMatica Production Pipeline
The diagram below describes the high-level flow and process of the entire system. We collect web material, both various pages and API feeds, and using a three step process, convert these data items into both numerical analytics and analyzed conversations that we then display in any number of product views. We collect information on both People and Companies in any particular vertical, as well as the underlying and supporting blog/tweet/forum/linkedIn/FaceBook/News/YouTube/WebPage/etc information and build an integrated searchable relational DB.
The Data Acquisition Controller
As the diagram below shows, this subsystem is responsible for collecting various types of web pages and managing access to several data api’s. This subsystem includes a web crawler, and various types of page parsers. We do not rely on any RSS feeds to collect Blog, Forums, FaceBook, or News information. To support this functionality, we have developed a very sophisticated proprietary page scraper/parser which allows us to scrape and parse any text off of any type of page or resource.
The SocialBase Builder
The original goal of this sub-system was to build more complete data sets for our verticals. We realized early on in the process that we could build reasonably complete data records for the people and companies that we were tracking using only web-based data.
The diagram below highlights this subsystem’s function and basis:
The Data Processing Controller
This controller manages various text processing and analytic procedures which are combined to create our unified SocialMatica database.
Again, see below for a visual representation:
To answer your “how is this different” question, we have created and optimized the process and the tools to create verticals, not large general knowledge bases. As previously mentioned, all data and vocabularies are vertical specific, and all tools are optimized for this process.
How is this “semantic data intelligence” being leveraged to follow the GOP race? Please be technical in your answer—no generalities; we are looking for an answer on the algorithmic/hardware/middleware-framework level here.
We understand the influence of the people in any vertical by understanding who they talk about, who talks about them, and what topics they are talking about. We used our general vertical framework, which was designed to evaluate people and companies in a space, and modified it slightly to look at Candidates in place of companies.
We rank candidates and influencers in the space. We rank these candidates by the number of influencers that are talking about them and the rank, or importance of these influencers. We rank influencers by how much they write, how much they are commented on, how often they are mentioned, the importance of the publishing site, the number of their twitter followers and tweets, the number of twitter mentions and twitter re-tweets, the relevancy of their tweets and blog posts to this vertical, and the timeliness of their posts.
In addition to these attributes we can also use other attributes such as number of social connections, education level, etc. to modify the ranking. These attributes are weighted based on our analysis of the social space and what we believe is most important.
Does your company work with any enterprise customers in a capacity outside of marketing analytics? In other words, is your platform being leveraged for any mission-critical operations outside of marketing, say for more direct BI or other purposes?
Yes we currently using our platform in support of research and business intelligence.
June 21, 2021
- Ceph Foundation Announces the Formation of the Ceph Market Development Group
- Domino 4.4 Boosts Data Scientists’ Ability to Work the Way they Want, Maximize Productivity
- HPE Fuels GreenLake Expansion with Cloud Services Support for Azure Stack HCI, SQL Server
- Dataiku Launches in AWS Marketplace
- HPE Acquires Determined AI to Accelerate Machine Learning Training
June 18, 2021
- Alva Named Winner in AI and Machine Learning Awards 2021
- Collibra Announces 24 Gold and Silver Partners for 2021
June 17, 2021
- Esri’s ArcGIS Platform Chosen for Red Bull X-Alps Competition Live Tracking App
- Collibra Announces 2021 Excellence Awards
- Latest Release of InterSystems IRIS Data Platform Provides Next Step in Data Fabric Adoption
- Zaloni Automates Data Governance, Fast Tracks Data Access with 6.4 Platform Release
- Qumulo, HPE GreenLake Cloud Services to Provide Pay-As-You-Go File Platform for Unstructured Data
- Lucidworks Joins Google Cloud Partner Advantage Program, Launches AI-Powered Search Platform
- TigerGraph Announces Center of Innovation in San Diego, R&D and Recruitment Efforts
- Monte Carlo, PagerDuty Integration Bring DevOps to Data Pipelines with End-to-End Observability
- HPE Passes Rigorous Splunk Engineering Tests for Kubernetes Operator with HPE Ezmeral
- Partners Together Now: Snowflake Announces FY21 Partner of the Year Award Winners
June 16, 2021
- Vertica Announces Early Access of Vertica Eon Accelerator
- Alation Named Top Vendor in End-User Study of Data Catalog Market for Fifth Consecutive Year
- Fetch.ai, Poznan Supercomputing and Networking Center to Develop AI Tools For Cancer Cell Detection
Most Read Features
- Newly ‘Headquarterless’ Snowflake Makes a Flurry of Announcements
- Big Data File Formats Demystified
- Do Customers Want Open Data Platforms?
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Understanding Your Options for Stream Processing Frameworks
- Why Data Science Is Still a Top Job
- Three Reasons Python Is The AI Lingua Franca
- Databricks Unveils Data Sharing, ETL, and Governance Solutions
- Cloudera To Go Private in $5.3 Billion Buyout by Wall Street Firms
- What’s Driving Data Science Hiring in 2019
- More Features…
Most Read News In Brief
- Confluent S-1 Reveals ‘Reimagining of Business’ Theme
- Confluent Files to Go Public. Who Could Be Next?
- Lakehouses Prevent Data Swamps, Bill Inmon Says
- Google Cloud Tackles Data Unification with New Offerings
- Google’s ‘Breakthrough’ LaMDA Promises to Elevate the Common Chatbot
- Databricks Unveil New Machine Learning Solution
- Dremio Charts Open Course with Dart
- MIT Researchers Leverage Machine Learning for Better Lidar
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Qualcomm Unveils 5G Modem for IoT
- More News In Brief…
Most Read This Just In
- SAS Named a Leader in Streaming Analytics Per Independent Research Firm
- Sumo Logic Signs Definitive Agreement to Acquire Sensu to Extend Open Source Strategy
- Relativity Acquires Text IQ to Drive Leadership in AI for e-Discovery, Compliance and Privacy
- University of Texas at San Antonio Researchers Collaborate to Improve Computer Vision for AI
- US Air Force RSO Expands Engagement with C3 AI as Strategic AI Platform
- Latest Release of SnapLogic Fast Data Loader Provides Fast, Free Cloud Data Warehouse Loading
- Esri’s ArcGIS Platform Chosen for Red Bull X-Alps Competition Live Tracking App
- Dgraph Rises to the Top Graph Database on GitHub with 11 G2 Badges, 11M Downloads
- Incorta Announces Tableau Connector to Extend Faster Data Analytics to All Customers
- Google Cloud Launches Datashare for Financial Services
- More This Just In…