In one of our previous articles, we discussed the idea of multi-dimensional topic modelling, and no it is not related to Star Wars, so if you thought it is, go here and give it a good read.
Back from Alderaan. Cool, let us get started. In this article, we are introducing and explaining the concepts of viewpoint, topic and opinion discovery from a text document in an unsupervised manner.
How can we model a viewpoint or a topic and the different techniques to classify documents based on their contrastive viewpoints. Then we discuss their implementations and evaluate their methods on different datasets.
Why does this matter? well in our work here at ALMETA to battle news bias and partisan coverage it is important for us to find and understand how different outlets view the same event and report it in different ways. interested? ok let us head on.
In an opinionated text, an author expresses her opinions on one or several topics, according to the author’s viewpoint. We define the key concepts of topic, viewpoint, and opinion as follows:
- A topic is one of the subjects discussed in a document collection.
- A viewpoint is the standpoint of one or several authors on a set of topics.
- An opinion is a choice of words that is specific to a topic and a viewpoint.
e.g. “Israel occupied the Palestinian territories of the Gaza strip”, the topic is the presence of Israel on the Gaza strip, the viewpoint is pro-Palestine and the opinion is negative. Indeed, when mentioning the action of building Israeli communities on disputed lands, the pro-Palestine side is likely to use the verb to occupy, whereas the pro-Israel side is likely to use the verb to settle. Both sides discuss the same topic, but they use different wording that conveys an opinion.
Concepts like viewpoint or topic can be considered latent variables (hidden variables) because there are not explicitly shown in a text, so we leverage probabilistic topic models such as LDA  or TAM  to learn these concepts from the words and other signals itself like POS-tagging, or co-occurrence resolution …
In  the authors introduce a probabilistic topic model based on LDA called VODUM. VODUM identifies topical words and viewpoint-specific, topic-dependent opinion words, using part-of-speech tags by assuming that nouns are topical words; adjectives, verbs and adverbs are assumed to be opinion words while all other words with different tags are removed.
So each word has two options either being a topical word or an opinion word.
VODUM uses a hierarchical dependency structure through a graphical model where every node presents some distribution (viewpoint dist, topic dist,..) and edges present the conditional probabilities between nodes (aka dependencies). We can see the graph model in the following figure.
How to model
papers related to topic modelling usually differ in structuring the dependencies and the relations between concepts (viewpoint, topic, opinion) and even the definition of the concepts itself, let us take  as an example:
Here the authors assume that a document is sampled from a mixture over topics as well as a mixture over viewpoints.
The two mixtures (topic and viewpoint) are drawn independently of each other, and thus can be thought of as two separate clustering dimensions. A word is associated with variables denoting its topic and viewpoint assignments, as well as two binary variables to denote if the word depends on the topic and if the word depends on the viewpoint. A word may depend on the topic, the viewpoint, both, or neither.
The paper  demonstrates the structure with a cool example, let’s consider a set of product reviews for a home theater system.
- Content topics in this data might include things like sound quality, usability, etc., while the viewpoints might be the positive and negative sentiments.
- A word like ‘speakers’, for instance, depends on the sound topic but not a viewpoint,
- While ‘good’ would be an example of a word that depends on a viewpoint but not any particular topic.
- A word like loud would depend on both (since it would be considered positive sentiment only in the context of the sound quality topic)
- While a word like think depends on neither.
On the other hand paper  describes the generative process for a document as modelled by VODUM.
Here the author writes the document according to her own viewpoint. Depending on her viewpoint, she selects for each sentence of the document a topic that she will discuss. Then, for each sentence, she chooses a set of topical words to describe the topic that she selected for the sentence, and a set of opinion words to express her viewpoint on this topic.
What path to use? well that is left to trail and error.
How to Implement
There is plenty of resources and datasets to implement any of these algorithms please review our article here for a full overview of the available dataset and codebases, as well as an example of the implementation.
In this article we tried to cover the basic Idea of view point discovery using multidimensional topic modeling, we saw how the viewpoints, topics and opinions can be represented. How can we view the process of writing an Article in light of these concepts. And finally How we can reverse engineer this process to find the view points topics and opinions that are expressed in a piece of text.
As usual if you are interested in this task don’t forget to check out the sources below.
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.
 M. J. Paul, C. Zhai, and R. Girju, “Summarizing contrastive viewpoints in opinionated text,” in
Proceedings of the 2010 conference on empirical methods in natural language processing,2010, pp.
 T. Thonet, G. Cabanac, M. Boughanem, and K. Pinel-Sauvagnat, “Vodum: A topic model
unifying viewpoint, topic and opinion discovery,” in European Conference on Information
Retrieval, 2016, pp. 533–545.