Follow Datanami:
October 12, 2020

New Report Highlights the Importance of Data Diversity

The vast majority of the world’s data was created in just the last couple years, creating an unimaginably wide ocean from which analysts seek to draw conclusions. There’s a problem, though: given the breadth of the data, many of these analyses focus just on a handful of major platforms, particularly social media platforms like Facebook and Twitter. A new “Babel Beacon” report from analytics firm Babel Street argues that this lack of “data diversity” is a major roadblock for robust analyses that use publicly available information. 

By way of example, Babel Street offered three case studies, all in the national security sector. First, the firm highlighted the Osama bin Laden raid, beginning with the tweets from Sohaib Athar that inadvertently broke the story about the ongoing helicopter raid on bin Laden’s compound in Pakistan. Babel Street’s analysts used the firm’s Babel X tool to examine sentiments and responses in reaction to these tweets, finding chatter on the dark web concerning possible retaliatory attacks. “The impact and influence of a social media post can quickly spread across numerous data platforms,” Babel Street wrote, “often with potentially more serious implications than the original post.”

Dark web messages pulled by Babel X when analyzing the conversation around bin Laden’s death. Image courtesy of Babel Street.

Next, Babel Street discussed research on mass shootings and domestic terrorism. Non-mainstream platforms, Babel Street explained, have often been used to broadcast intentions before a violent incident: for instance, the Christchurch shooter, the Poway synagogue shooter, and the El Paso Walmart shooter all declared their intentions on the same website ahead of time. Babel Street’s analysis of these incidents showed widespread conversation on other platforms outside of the most popular social media sites.

Finally, Babel Street outlined the information surrounding the U.S. strike on Qasem Soleimani, the Iranian major general. The initial Twitter reporting by local Kurdish reporter Barzan Sadiq, Babel Street says, provided “useful situational awareness,” but it took “roughly five hours to report that Soleimani was killed” – a situation that could have been bolstered by including foreign hyper-local news agencies and message boards. Babel Street particularly emphasized Telegram (“heavily used by Middle Eastern new agencies, as well as terror groups”), arguing that “understanding the drivers of violence or follow-on effects of an event requires a broader consumption of data.”

Babel Street offers four takeaways from these case studies:

  1. Focusing research using publicly available information on just a few large social media platforms “severely limits the information analysts can derive about ongoing events.”
  2. Even when those vents break on large social media platforms, discussions quickly spread to outlets like blogs or message boards, and analysts should plan to account for that in their data collection processes.
  3. Social media posts “often lack wider context about developing situations,” and relying on social media alone is “always a suboptimal approach.”
  4. Social media companies are moving to restrict hate speech and violent content, leading many groups to move to other (often encrypted) platforms to share information.

“In order to continue monitoring for potential threats, or to enable a more thorough understanding of a developing situation,” the report reads, “it is critical to capture information from multiple publicly available data sources to derive the most accurate understanding. 

Datanami