NCSA’s SMM Software Tools Aim to Make Sense of Social Media Data
Aug. 21, 2023 — Social media platforms offer a treasure trove of data for researchers seeking to understand public opinion, political behavior and a host of other social science topics, but accessing and analyzing that data can be difficult for researchers without a computational background.
“There are commercial tools that make it easy, but they are extremely expensive, and they focus on marketing needs, not on what researchers are interested in,” said Chen Wang, a senior research programmer in the NCSA Software Directorate. Open-source analysis tools remove the cost barrier but require a level of programming knowledge to use, such as the ability to write a Python script. Most social science researchers are not programmers and have different skill sets, Wang added.
Fortunately, when scientists have a need for a tool, NCSA has the technical expertise and resources to develop a solution. The Social Media Macroscope (SMM) is one such solution, filling a gap between expensive social media analytics tools that lack transparency and have rigid requirements and open-source tools that require technical know-how. SMM was created in 2017 by Joseph T. Yun, Ph.D., who was then head of research and innovation at University of Illinois Technology Services, in collaboration with NCSA. The project began as a science gateway to give social scientists web-based access to research tools and resources and was hosted by the National Science Foundation-sponsored Hubzero platform.
As a social media science gateway, SMM boasted more than 1,000 users and more than 5,000 individual uses worldwide. Following Yun’s departure from Illinois in 2021 and subsequent appointment as a research professor at the University of Pittsburgh, Wang assumed leadership of the SMM project. Collaborating with Yun, she embarked on an initiative to enhance the software, introduce new tools and broaden its user base.
“Our mission is to create a fully functional system for the acquisition, storage, sharing, analysis and visualization of social media data,” said Wang. “We are continuing to build out a portfolio of tools to serve different communities, including social scientists, educators, public health professionals and people in industry.”
With a grant from NCSA’s Center-Directed Discretionary Research (CDDR) program, Wang has led the effort to migrate SMM from Hubzero to Radiant, a private cloud-computing platform operated by NCSA for the benefit of NCSA and Illinois faculty and staff. New features have been added to SMM, such as the ability to port data into Clowder, NCSA’s web-based data management and sharing platform, CILogin for secure identity and access management, and Kubernetes, an open-source system for orchestrating the deployment of SMM tools contained in Dockers, the virtual containers that package software.
“There are a lot of components to the Social Media Macroscope and to use them requires orchestration,” said Yong Wook Kim, a senior research software engineer and member of the SMM development team. “Kubernetes is something I have expertise in, and it’s important for automating deployments, scaling and managing deployments.”
Added Wang, “It’s like cooking. You have many smaller pieces that are the ingredients and you need the recipe to put them together to produce what you need.”
Analysis with a ‘SMILE’
The heart and soul of SMM is the Social Media Intelligence and Learning Environment, or SMILE, a platform for data ingestion, analysis and sharing. SMILE collects real-time and historical data from Twitter (now rebranded as X) and Reddit to perform sentiment analysis, phrase mining, entity recognition, machine-learning classification, network analysis and natural language processing. It can examine retweets to find connections among social media posts, cluster data by topic, output histograms to show the frequency distribution of data such as keywords, and create tree diagrams to understand trends and the probability of different outcomes related to social media topics.
“All the user needs is a list of keywords to do their search,” said Wang. “They use a simple search box and once their data is available, they get a histogram that shows how active different posts are over hours, days or even months.”
From that point, the user simply clicks among options to use features such as creating a sentiment analysis pie chart, which categorizes opinions about a post or topic.
As SMM migrates from Hubzero to Radiant, the NCSA software development team has set up a “playground,” where potential users can explore SMM and its tools in a temporary virtual space where no data is stored for more than a day. Data can be transferred to SMM Clowder for extended storage, where it offers the same SMM analytics with fewer tuning parameters, allowing for swift insights from the data.
The Future: Resources That Run Seamlessly for Business and Academia
Wang estimates about half the SMM users so far have been faculty and staff on the Illinois campus. For example, Unnati Narang, Ph.D., and Ye Joo Park, faculty members in the Gies College of Business, have used SMILE with marketing classes, enabling MBA students to conduct social media analysis without the need to learn programming or use expensive commercial platforms.
“Because we have containerized every component and deployed them using Kubernetes and Helm charts (a package of resources used to easily deploy a Kubernetes cluster), we can orchestrate the system way more efficiently, and the system does load balancing and automatic scaling, which helps manage heavy traffic loads,” said Kim. “This will also make it easier for users outside NCSA to deploy the full stack of SMM tools on their own computational resources.”
Currently, the SMM team is seeking industry users who constantly use social media data to understand their customers, sentiments about their brands, marketing trends and more. With help from Brendan McGinty, director of NCSA’s Industry Partner Program, the SMM team hopes longtime partners in sectors such as pharmaceuticals, energy, agriculture, automotive and aerospace can provide use cases that help to further improve SMM and expand its reach. The platform could also be a valuable tool in K-12 education in classes that focus on technical issues (such as data privacy) as well as in social science classes.
“With all the changes at Twitter (now X), there is some uncertainty in the social media landscape, but there is also a high demand for these kinds of tools,” said Wang. “Our main goal is to provide something that is open source, extremely easy to use and customizable depending on the user’s needs. Long-term, we hope to provide social media analytics tools that can run seamlessly on different resources at different businesses and institutions.”