The Data Exchange Podcast: Xinyi Zhou on state-of-the-art detection strategies and future research directions.
In this episode of the Data Exchange I speak with Xinyi Zhou, a graduate student in Computer and Information Science at Syracuse University. Xinyi and her advisor (Reza Zafarani) recently wrote a comprehensive survey paper entitled “A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities”. They set out to organize the many different methods and perspectives used to detect fake news. Their paper is a great resource for anyone wanting to understand the strengths and limitations of various state-of-the-art techniques, and a feel for where the research community might be headed in the near future.
We first discussed good working definitions for “fake news”. The phrase has two components: a piece that indicates veracity or lack thereof (“fake”), and another piece that indicates the type of information it contains (“news”). Xinyi and Reza list two definitions:
Broad: Fake news is false news
Narrow: Fake news is intentionally false news published by a news outlet
Once we settled on working definitions, we focused our conversation on the different approaches used to detect fake news:
- Knowledge-based methods detect fake news by verifying if knowledge within the text is consistent with facts.
- Style-based methods use machine learning models that rely on signals pertaining to how fake news is written (e.g., if it is written with extreme emotions).
- Information Propagation-based methods detect fake news based on how it spreads online.
- Source-based methods detect fake news by studying the credibility of news sources at various stages (authors, publishers, social media users, etc.)
A few high-level observations are in order. Each of these approaches rely on different data sources and modeling techniques, including NLP, data mining, machine learning, and newer tools like deep learning and large language models are beginning to be used. Secondly, real-world systems use multiple approaches and ensembles of models. Finally, some of these sources, particularly knowledge-based methods, rely on up to date data sources. It can be challenging to create knowledge bases, knowledge graphs and fact-checking sites that cover a range of topics. Keeping them up to date – possibly in near real-time – requires lots of resources and can probably only be done by a few organizations and companies.
The other important consideration is what sorts of signals should your detection system utilize? Xinyi noted that systems that use multimodal information tend to do better compared to those that utilize only one type of data source:
- Multimodal detection tools use a combination of data sources: textual information, visual information, temporal information (date when content first appears), and network graph information. … We have found that using multimodal information works better for detecting fake news compared to tools that rely on unimodal information.
Readers of our newsletter know that disinformation and fake media are topics that I follow closely. I monitor events in countries like the Philippines where the combination of social media platforms (mainly Facebook) and highly organized troll operations have turned disinformation into a serious problem.
Xinyi does an excellent job guiding us through the landscape of fake news detection today, and we close with a glimpse into what lies ahead.
Subscribe to our Newsletter:
We publish a popular newsletter where we share highlights from recent episodes, trends in AI / machine learning / data, and a collection of recommendations.
- A video version of this conversation is available on our YouTube channel.
- Download the 2020 NLP Survey Report and learn how companies are using and implementing natural language technologies.
- Weifeng Zhong: “Using machine learning to detect shifts in government policy”
- Mayank Kejriwal: “Building and deploying knowledge graphs”
- Amy Heineike: “Machines for unlocking the deluge of COVID-19 papers, articles, and conversations”
- Matthew Honnibal: “Building open source developer tools for language applications”
- Ameet Talwalkar: “Democratizing Machine Learning”
- Alan Nichol: “Best practices for building conversational AI applications”
- Denise Gosnell: “How graph technologies are being used to solve complex business problems”
Register to join live or watch on-demand.