What Constitutes a Bad News Article

Subjective Stance Detection What is it? and How to do it?

If you don’t know what is stance detection make sure to check our article on it. Are we on the same page? Cool let’s go.

First a motivational example:

Many products on the internet allow the user to leave some feedback this feedback is usually reviewed manually to figure out what are the users likes or dislikes in the product, what are the features they desire, and what are the problems they are facing. However wouldn’t it be amazing if we can find what specific aspect of the product does the review like or dislike? for example: in a review of a smartphone like the following:

The RAM is really small, the price is low though

Would it be possible to figure out that the author hates the small RAM but is happy about the cheap price? well, keep that thought because this is exactly what we are gonna discuss today.

This article is a part of our series on political bias detection we will hopefully introduce you to the various aspects of our political bias detection system, and you can learn about:

The What

In this task given a piece of text and a target extract the sentiment polarity towards that target, the target can be explicitly mentioned within the text or it can be inferred from it, usually the target would be drawn from a close list however some methods accepts ant string as a possible target.

The How

To achieve this task using a Machine Learning methodology we need 2 main things, a model to train and a data source to feed that model, let us start with the data first

Where Can I Get Some Training Dataset

There are some datasets in English. See [3] for a complete list and the sources of these data sets vary.

Many rely on online debates sites such as this, which are available only in English these topics include both serious and funny debates and cover a wide variety of targets.

Congressional and parliamentary debates regarding controversial resolutions have also been used, such means can allow the creation of very large and relatively clean datasets in an automatic or semi-automatic manner.

The authors in [4] use the argumentative essays written by TESOL students to create their data sets.

When it comes to social media, manual annotation is a must, however, some tricks can simplify the process like utilizing hashtags, emojis,… see [5] for the details of the sem-eval 2016 twitter stance data creation.

Finally, the authors in [6] also manually annotate their multi-target news stance detection dataSet. In order to simplify the process of annotation of full articles, the annotators use the following scheme to annotate for polarity:

  • In case the target is explicitly mentioned within the article: a small context of 2-3 sentences before and after the target mention
  • In case the target is not found in the article: they use the title and the first paragraph, however simple text summarization schemes such as textrank [7] can be used.

In Arabic there is basically no data, the only dataSet we found was from [8] yet it deals with objective stance detection (i.e. fact-checking), this mainly because most of the data sources mentioned above in English are not available in Arabic.

This means that in order to tackle this problem we need to build our own dataset. but can we make this task easier?

Enter Distant Supervision

Distant supervision is a method of supervised text classification wherein the training data is automatically generated (mostly expanded using an already trained model) using certain indicators present in the text. There are multiple indicators that can be used:

  • Social media indicators: hashtags, emojies, posts reactions, and so on. The authors in [5] use this scheme to expand their sem-eval2016 data, and this new data was use in 2 main ways
    • as a new data set that can be added to the original set: in this case the model performance degraded mainly because of the noisy nature of this data in comparison with the manually annotated data
    • to train feature extractors such as words embeddings and the authors concluded that this can boost the model performance
  • Social media graph analysis: [9]–[11] here the task is taken to a higher level by finding the stance of a user towards a certain target not the stance of a single post or tweet, to achieve this type of supervision the authors in [11] for example build a multi-level similarity graph between the users based on multiple similarity metrics, then using a seed of users whom stance towards a target is already known they annotate the other users. However there are multiple problems with the practical application of this approach:
    • these methods assume that the stance of a user towards a certain target is always constant i.e. people does not change their opinions of targets
    • some of the similarity measures such as geographical distance, gender and age can create prejudice in the system
  • Community extraction: this is mainly done through platforms like Reddit where it is safe to assume that the users of a certain narrow community such as r/SandersForPresident and for less serious matters such as r/grandpajoehate, that the opinion of the users towards the target is consistent and thus the comments can constitute a valid dataset. Alongside debate sites such communities are extremely useful data sources for multi-target stance detection. However this type of distance supervision is not viable in Arabic due to the the lack of such sources, the only site similar to Reddit in Arabic is hsoub which is a community centered around programming.
  • Voice prosody: finally some people sought to predict stance using the prosodic features of the talk shows hosts in order to create large data sets see [12] however the performance using speech features only is not satisfactory.

Now let us Jump to the Implementation

  • Overall simple shallow models are more favorable than deep ones as they have much simpler implementation and the gain in performance is negligible, in the semeval2016 task 6 dataSet [13] the shallow baseline of SVM with feature engineering surpassed all of the competitors models, and it took the research community 2 years to beat it with a very low margin [14]
  • In most of the cases uni-target models (models trained only on a single target) [13], [15] have a much higher performance than multi-target models like [16], [17] (models trained using data for multiple targets and thus can incorporate any target), the only exception is [14] yet this comes with huge overhead with regards to training data and model complexity
  • In multi-target models [14], [16]–[18] the target information is usually added to neural models using a recurrent network that goes over the embeddings of the target phrase in order to create a target representation. These information can be incorporated through simple linear transforms [18] or by using attention mechanism in higher layers [14] , see figures 2 from [14] and 1 from [18] for comparison.
  • Overall the SOTA in stance detection is pretty low for sources such as tweets and online debates [13] the best performance is about 70% in f-measure and this is measured on a very restricted set of targets, for more regular sources like the students essays [4] the performance higher and can reach up to 82% f-measure. However, this is not the only factor for example the news articles the performance reported by [6] is less than 60% and this is mainly because the authors used a multi-target model. This low performance is attributed to the following issues:
    • the implicit mention of targets mainly in tweets and reviews either using hashtags or more often using pronouns, the authors in [5] compared the performance of the their baseline between the set of tweets that explicitly mention the target and the set of tweets that don’t and found a difference in performance up to 30% f-measure absolute. See figure 3 for a detailed table.
    • The use of sarcasm and jokes to convoy negative sentiment mainly in t
    • the need of common-sense models to group target together for example capturing the fact that a positive sentiment towards (Bernie Sanders, Hilary Clinton, Democrats) is equal to having a negative sentiment towards (Donald Trump, Republicans), some efforts to resolve this issue in the case of congressional debates were carried out like in [19] but with limited success.
    • The lack of data even in English with the semeval corpus being used in nearly all the stance detection publications after 2016
  • Finally, based on the assumption we had at the introduction we should use the confidence of the classifiers as an indicator of the bias. However, we can’t confirm the validity of this assumption for stance detection mainly because the confidence of the trained models for this task (mainly the neural models) is usually within the vicinity of 50%, see figure 4 from [6] for example. This phenomenon is attributed to the fact that usually the articles are not biased and hold a neutral stance most of the time [20](at least in western media), the comparison with the more polarized Arabic media is not clear and we don’t have data to assert or deny the applicability of this scheme.
figure 1 linear transformation based target incorporation from [18]

figure 2 attention based target incorporation from [14]

figure 3 loss in performance due to absence of target in text from [5]

figure 4 results from [6] that shows the prevalence of neutral articles and low confidence


In this article, we explored the idea of subjective stance detection and how we can find the sentiment of the text towards various targets. we have seen how we can find data for such a system in English and how can distance supervision help us in doing it for low resourced languages like Arabic.

If you feel intrigued, make sure to check out our article on aspect and entity level sentiment analysis which is basically the next step.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.


[1] M. Iyyer, P. Enns, J. Boyd-Graber, and P. Resnik, “Political ideology detection using recursive neural networks,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014, vol. 1, pp. 1113–1122.

[2] R. Abooraig, S. Al-Zu’bi, T. Kanan, B. Hawashin, M. Al Ayoub, and I. Hmeidi, “Automatic categorization of Arabic articles based on their political orientation,” Digit. Investig., vol. 25, pp. 24–41, 2018.

[3] R. Wang, D. Zhou, M. Jiang, J. Si, and Y. Yang, “A Survey on Opinion Mining: From Stance to Product Aspect,” IEEE Access, vol. 7, pp. 41101–41124, 2019.

[4] A. Faulkner, “Automated classification of stance in student essays: An approach using stance target information and the Wikipedia link-based measure,” in The Twenty-Seventh International Flairs Conference, 2014.

[5] S. M. Mohammad, P. Sobhani, and S. Kiritchenko, “Stance and sentiment in tweets,” ACM Trans. Internet Technol. TOIT, vol. 17, no. 3, p. 26, 2017.

[6] S. Ruder, J. Glover, A. Mehrabani, and P. Ghaffari, “360 stance detection,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 2018, pp. 31–35.

[7] R. Mihalcea and P. Tarau, “Textrank: Bringing order into text,” in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004.

[8] R. Baly, M. Mohtarami, J. Glass, L. Màrquez, A. Moschitti, and P. Nakov, “Integrating stance detection and fact checking in a unified corpus,” ArXiv Prepr. ArXiv180408012, 2018.

[9] R. Dong, Y. Sun, L. Wang, Y. Gu, and Y. Zhong, “Weakly-guided user stance prediction via joint modeling of content and social interaction,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1249–1258.

[10] J. Ebrahimi, D. Dou, and D. Lowd, “Weakly supervised tweet stance classification by relational bootstrapping,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1012–1017.

[11] O. Fraisier, G. Cabanac, Y. Pitarch, R. Besançon, and M. Boughanem, “Stance Classification through Proximity-based Community Detection.,” in HT, 2018, pp. 220–228.

[12] N. G. Ward, J. C. Carlson, O. Fuentes, D. Castan, E. Shriberg, and A. Tsiartas, “Inferring Stance from Prosody.,” 2017.

[13] S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, and C. Cherry, “Semeval-2016 task 6: Detecting stance in tweets,” in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 31–41.

[14] C. Xu, C. Paris, S. Nepal, and R. Sparks, “Cross-Target Stance Classification with Self-Attention Networks,” ArXiv Prepr. ArXiv180506593, 2018.

[15] J. Mitrovic, B. Birkeneder, and M. Granitzer, “nlpUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection.”

[16] Y. Zhou, A. I. Cristea, and L. Shi, “Connecting targets to tweets: Semantic attention-based model for target-specific stance detection,” in International Conference on Web Information Systems Engineering, 2017, pp. 18–32.

[17] I. Augenstein, T. Rocktäschel, A. Vlachos, and K. Bontcheva, “Stance detection with bidirectional conditional encoding,” ArXiv Prepr. ArXiv160605464, 2016.

[18] J. Du, R. Xu, Y. He, and L. Gui, “Stance classification with target-specific neural attention networks,” 2017.

[19] M. Lai, D. I. H. Farías, V. Patti, and P. Rosso, “Friends and enemies of Clinton and Trump: using context for detecting stance in political tweets,” in Mexican International Conference on Artificial Intelligence, 2016, pp. 155–168.

[20] M. Drissi, P. Sandoval, V. Ojha, and J. Medero, “Harvey Mudd College at SemEval-2019 Task 4: The Clint Buchanan Hyperpartisan News Detector,” ArXiv Prepr. ArXiv190501962, 2019.

Leave a Reply

Your email address will not be published. Required fields are marked *