AI Squares Off Against Fake News
In 2016, “fake news” was the Oxford English Dictionary’s Word of the Year. Thousands upon thousands of fake stories had plagued the internet throughout the U.S. presidential election – some shared millions and millions of times. A fake Associated Press tweet had sent stocks plummeting; teenagers in Macedonia filled cafes to earn living wages from writing fake articles. Almost overnight, fake news turned into an immensely profitable industry.
So when Reza Zafarani (an assistant professor with the electrical engineering and computer science department at Syracuse University) and his colleagues Xinyi Zhou and Kai Shu took the stage at KDD 2019 to talk about how machine learning and artificial intelligence could be employed to combat fake news, the room was packed.
What is “Fake News,” and How Is It Detected?
Fake news, Zafarani clarified, is not simply false news – nor is it satire, disinformation, misinformation or rumor. Fake news must be false, shared with bad intentions and presented as news. The question then, is: how can AI ascertain all three factors?
Fake news is most frequently detected by manual fact-checkers – sites like PolitiFact, Snopes and so on that employ experts to evaluate individual articles and judge their accuracy. Zafarani argues that these tools, while useful, have poor scalability for broad approaches, leading to significant gaps in coverage and delays in assessment. This also applies to model training, where even annotating a dataset may be too burdensome for effective approaches.
“Obtaining a reliable fake news dataset is very time and labor intensive,” reads one paper by Kai Shu, a PhD candidate from Arizona State University who shared the stage with Zafarani. “Thus, it is also important to consider scenarios where limited or no fake news pieces are available in which semi-supervised or unsupervised models can be applied.”
And so, as companies like Facebook and Google are trying to tackle fake news before it gains much traction, a new approach is necessary. For AI, they argue, there are effectively three primary tactics:
- Use AI to assess the basic authenticity of the claims.
- Use AI to assess whether the article is written differently than factual articles.
- Use AI to assess whether the article is propagating differently than factual articles.
Using AI to Detect Factual Accuracy
First, of course, we know that fake news is fake – its content will compare disfavorably to a well-developed knowledge graph.
Using AI to assess authenticity is, in essence, basic fact-checking – similar to how, say, Google Assistant or Alexa might answer the question “what is the population of Norway?” Alexa might look for structured answers to that question from reliable sources, delivering an answer. But assessing social media posts – as these researchers were doing – often means encountering unstructured data from sites with unknown reliability, complicating the situation.
Using AI to Analyze the Language of Fake News
Second (and more surprisingly), researchers assert that fake news is written differently – that is, that factual statements and invented statements measurably differ in terms of content and quality (the “Undeutsch hypothesis”). “What you notice is that when you have a story where you have a lie, basically the breadth and depth is significantly less than the actual true story,” Zafarani said.
Xinyi Zhou, a PhD candidate at Syracuse University, showcased some startling data. Her team analyzed the text content of news datasets, demonstrating that fake news datasets used significantly fewer unique words and more emotional words. Some other trends presented themselves, as well – for instance, fake news articles tended to use more words in their headlines but fewer words in their body texts. Beyond that, the methods for text-based analysis such as “deception detection” have been studied, but, the researchers write, “the underlying characteristics of fake news have not been fully understood.”
Using AI to Assess How Fake News Propagates
Finally: that people react to fake news differently in terms of how they engage with it and how they spread it. “For example,” Zafarani said, “there are some recent studies that show that fake news […] did propagate much deeper. And it propagates wider, as opposed to true news that usually propagates not as deep and not as wide.” Zhou again showcased the data, showing stark differences between true and fake news in the datasets they analyzed. This has a number of applications: for instance, the researchers noted that fake news dissemination shows unique temporal patterns, and that analyzing early shares of an article might be a good indicator of its trustworthiness.
Similarly, author, site and user credibility analyses can be used to detect fake news through propagation. Major social media sites already employ spammer and bot detection methods that aim to capture coordinated campaigns, and expansions of this technique could also highlight users who are prone to sharing fake news, allowing for early detection of additional fake content.
The Path Forward
Most of the fake news researchers presenting at KDD 2019 advocated for an ensemble model-based approach: one that relies on a number of metrics and features, rather than a single feature. Zhou showcased an approach called “tri-relationship embedding” (TriFN) that combined news content analysis with social context analysis. The approach outperformed its individual components (headline analysis, lexicon analysis, user analysis and so forth), achieving an 86.4% fake news detection rate. The researchers also presented FakeNewsNet, a more robust data repository for testing fake news detection tools that contains social context and spatiotemporal information rather than just articles and sources.
Of course, even a more robust approaches like TriFN and powerful datasets such as FakeNewsNet face serious challenges. After all, even if a model could be trained to recognize the writing style hallmarks of fake news, writing styles can be manipulated to avoid detection. The authors suggested that visual features extracted from images and videos might be a necessary area for future research, especially with the rise of “deepfake” content that convincingly edits videos to feature different public figures.
Zafarani stressed these open issues, highlighting a number that need to be resolved before fake news detection is fully reliable and hands-free – but even so, with AI on the case, the future of fake news detection seems a little more trustworthy.
About the Research
To read more about the research conducted by the researchers at the tutorial, visit the website for the fake news tutorial at KDD 2019 here.