Robust news veracity detection
Loading...
Authors
Horne, Benjamin D.
Issue Date
2020-05
Type
Electronic thesis
Thesis
Thesis
Language
ENG
Keywords
Computer science
Alternative Title
Abstract
Second, we learn from news source behavior and relationships. In this approach, we explore content sharing behavior by both mainstream and alternative news sources. Specifically, we construct content sharing networks from our unstructured news article data and perform an extensive set of qualitative analyses to better understand how content sharing is used maliciously by unreliable news sources. Using this gained understanding and rich network structure, we employ both standard network science metrics and network embedding methods to utilize content sharing networks in the task of automatically detecting low veracity news articles. We then demonstrate how the two approaches proposed in this thesis can be used together to automatically detect low veracity news.
The spread of false and misleading news online can have offline impacts. These impacts range widely, including health, public opinion, and safety. While low veracity information is not necessarily a new occurrence, the scale at which it is produced and disseminated is. This production and dissemination scale is partial due to the low barrier to entry into the information ecosystem. Today, anyone can spread information by creating a blog or website which appears to be proper news source. These seemingly credible sources of information can then obtain wide attention due to social networks and the engagement-based algorithms that curate the media feeds in these networks. In turn, this lack of gate-keeping opens the media ecosystem up for malicious actors to spread targeted disinformation.
Due to the scale of low veracity news, the main question asked in this thesis is: Can we automatically detect low veracity news articles? Specifically, in this thesis, we develop and examine two broad approaches to automatically detecting low veracity news articles. First, we learn from news article text. In this approach, we focus on creating features of high veracity and low veracity news articles through text-based feature engineering. This feature engineering process starts out as an exploration on fact-checked news article data, with the goal of creating features that are interpretable by the eventual human end-user. After an understanding of the feature space is gained, we transfer the methodology to higher-level concepts of news veracity, such as reliability and bias, in order to use large scale data in machine learning tasks. We then test the robustness of these machine learning models in a series of concept drift tests and adversarial attack tests.
The spread of false and misleading news online can have offline impacts. These impacts range widely, including health, public opinion, and safety. While low veracity information is not necessarily a new occurrence, the scale at which it is produced and disseminated is. This production and dissemination scale is partial due to the low barrier to entry into the information ecosystem. Today, anyone can spread information by creating a blog or website which appears to be proper news source. These seemingly credible sources of information can then obtain wide attention due to social networks and the engagement-based algorithms that curate the media feeds in these networks. In turn, this lack of gate-keeping opens the media ecosystem up for malicious actors to spread targeted disinformation.
Due to the scale of low veracity news, the main question asked in this thesis is: Can we automatically detect low veracity news articles? Specifically, in this thesis, we develop and examine two broad approaches to automatically detecting low veracity news articles. First, we learn from news article text. In this approach, we focus on creating features of high veracity and low veracity news articles through text-based feature engineering. This feature engineering process starts out as an exploration on fact-checked news article data, with the goal of creating features that are interpretable by the eventual human end-user. After an understanding of the feature space is gained, we transfer the methodology to higher-level concepts of news veracity, such as reliability and bias, in order to use large scale data in machine learning tasks. We then test the robustness of these machine learning models in a series of concept drift tests and adversarial attack tests.
Description
May 2020
School of Science
School of Science
Full Citation
Publisher
Rensselaer Polytechnic Institute, Troy, NY