In this article, we will review all the researches done in the field of discovering political bias.
Understanding Characteristics of Biased Sentences in News Articles
- Bias Labeling via Crowd-Sourcing They used crowdsourcing to collect bias labels using “Figure Eight” platform. In crowdsourcing they let the workers make judgements on each target news article (using also the reference news article).
- Analysis of Perceived News Bias To analyze what kind of words are tagged as bias triggers by the workers: they analyze the phrases annotated as biased in terms of the word length (4 words in a sentence have been annotated). The most frequent phrase pattern (3 words in a sentence have been annotated), and if a sentence includes any word annotated as biased, the sentence itself is biased.
Google News is used to collect news from different outlets. To construct the dataset they crawled all news articles available online that described the aforementioned event. Based on manual inspection, they then verified whether all articles are about the same news event. After that they extracted the titles and text content from the crawled pages ignoring pages which covered only pictures or contained only a single sentence.
In the end, their dataset consists of 89 news articles with 1,235 sentences and 2,542 unique words from 83 news outlets. Articles contain on average 14 paragraphs.
A Stylometric Inquiry into Hyperpartisan and Fake News
They attempted to uncover relations between the writing styles that people may involuntarily adopt as per their political orientation. To do that, they use two sets of documents (e.g. left-wing articles and right-wing articles) as input. The reconstruction errors are measured while iteratively removing the most discriminative features of a style model comprising the 250 most frequent words used to separate the two chunk sets with a linear classifier; The resulting reconstruction error curves are analyzed with regard to their slope. If the steeper decreases in these plots this indicate to style similarity of the two input document sets.
The corpus comprises a complete sample of the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan publishers (three left-wing and three right-wing) and three mainstream publishers. All publishers earned Facebook’s blue checkmark.
For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked claim-by-claim by 5 BuzzFeed journalists, including about 10% of posts forwarded from third parties.
Five BuzzFeed journalists conducted the manual fact-checks of the news articles., posts were rated “mostly true”, “mixture of true and false”, “mostly false”, or, if the post was opinion-driven or otherwise lacked a factual claim, “no factual content.”
The Clint Buchanan Hyperpartisan News Detector
They investigate the recently developed Bidirectional Encoder Representations from Transformers (BERT) model for the hyperpartisan news detection task. they used BERT-BASE that consists of 12 layers and 110 million parameters and BERT-LARGE that consists of 24 layers and 340 million parameters.
They focus primarily on the smaller data set of 645 hand-labelled articles provided to task participants, both for training and for validation.
They also use additional 600,000 training articles labelled by publishers as an unsupervised data set.
Propaganda Analysis Meets Hyperpartisan News Detection
They trained their models on the Hyperpartisan News Dataset from SemEval-2019, Task 4 (Kiesel et al.,2019), which is split by the task organizers into:
- Labeled by-Publisher: contains 750K articles labeled via distant supervision, i.e. using labels of their publisher.
- 2) Labeled by-Article: This set contains 645 articles labeled through crowd-sourcing (37% are hyperpartisan and 63% are not).
They developed their system for detecting hyperpartisanship in news articles by training a logistic regression classifier using a set of engineered features that included the following:
- Word n-grams: they extracted the k most frequent [1; 2]-grams, and they represented them using their TFiDF scores.
- Bias Analysis: they analyze the bias in the language used in the documents by creating bias lexicons that contain left and right bias cues, and using these lexicons to compute two scores for each document, indicating the intensity of bias towards each ideology.
- Lexicon-based Features: they use 18 lexicons from the Wiktionary, Linguistic Inquiry and Word Count (LIWC). For each lexicon, they count the total number of words in the article that appear in the lexicon. This resulted in 18 features, one for each lexicon.
- Vocabulary Richness: For this task, they used the following vocabulary richness features:
- type-token ratio (TTR): the ratio of types to tokens in a text
- Hapax Legomena: number of types appearing once in a text
- Hapax Dislegomena: number of types appearing twice in a text
- Honore’s R: A combination of types, tokens and hapax legomena
- Readability they also used the following readability features that were originally designed to estimate the level of text complexity: 1) Flesch–Kincaid grade level: represents the US grade level necessary to understand a text 2) Flesch reading ease: is a score for measuring how difficult a text is to read and 3) Gunning fog index: estimates the years of formal education necessary to understand a text.
Hyperpartisan News Detection with Attention-Based Bi-LSTMs
They employ a range of variations and combinations of recurrent neural networks (RNN) and convolutional neural networks (CNN) and Bidirectional LSTM (Bi-LSTM).
The data released by organizers are labelled with the tagset Labels-by hyperpartisan, not-hyperpartisan. (the same dataset used by the previous paper)
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store