While reading the news each one of us perceives it in a different manner. We have our own biases and we tend to search for information that confirms our previous beliefs. Thus different people might have drastically different viewpoints of the same event, and this effect extends to the news anchors which are also subject to this kind of bias.
Here at Almeta, we are in no position to decide if a certain viewpoint is right or wrong. However, we believe that reading different viewpoints of the same event, listening to arguments that are contrary to your beliefs and accepting differences, can be a step toward really uncovering the whole picture.
In this article we will explore the following question: “for a given news article can we automatically find other articles that describe the same event but have different view-point”. This feature will hopefully allow our users to read different articles describing the same event yet having different takes on it, and we leave the judgment of these viewpoints as valid or not to the users.
What is VODUM
VODUM stands for Viewpoint and Opinion Discovery Unified Model, we will gloss over it for now but if you wish to have more details please review our previous article
This model falls into the category of multi-dimensional topic models, where the model tries to capture multiple aspects of the text including topic, opinion and viewpoint. In very simple terms the model is a generative model based on 3 different probability distributions:
- When a writer starts an article she selects a certain viewpoint from the list of possible viewpoints based on a probability distribution PI, this distribution can be described by a vector of probabilities, This selected viewpoint is the same over the whole document.
- After selecting a viewpoint the author selects for every new sentence a certain topic from a list of viewpoint-specific topics based on a second distribution over topics called theta.
- Finally, after selecting a viewpoint and a viewpoint-specific topic, the author proceeds to select the words of the sentence, at this step the author chooses between 2 types of words Topical (related to topics), or Opinion-words also using 2 different distributions phi0 and phi,
This means that the overall model can be described using 4 distributions shown in the following figure,
The goal of the training step is to estimate these distributions using the training text in an unsupervised manner similar to any other topic model with the goal of selecting a set of distributions that minimizes the perplexity as much as possible.
In the inference phase, the text is fitted to the different distributions and then the most probable viewpoint/topic distributions are returned.
The hyper-parameters of the training process are the number of topics, the number of viewpoints and the priors of the aforementioned distributions.
To use this model for contrary view detection a model should be fitted to the articles of each event using a pre-defined number of viewpoints (for now 2) and then articles are clustered using VODUM. Afterwards, for each article that is assigned by VODUM to a certain viewpoint, we can suggest other articles from other viewpoints.
Experiment set up
All the results and experiments are carried out on a small in house dataset initially used for event-detection task, you can read more about this dataset creation from our previous article. We took a sample of this set that only contains controversial events (events where at least 2 different viewpoints exist) plus a random sample of non-controversial events. All the evaluation is done in a manual way.
For VODUM we used the official implementation.
For each of the events we extract all the articles and then do the following:
- Text is fully normalized (Alefs, English, nums and Punctuation)
- Stop words removal
- Lemmatization to improve topic modelling
- Word type selection: recall that the model needs to build different distributions for topical and opinion models, we followed 2 different methods to select word type:
- In the paper, the authors used POS tags to select word type, where all Nouns are considered Topical where the other POS types are Opinion-words. This is clearly an inaccurate approximation
- Another option is to use sentiment lexicons where subjective words (words that contain sentiment) are considered opinion words while non-subjective words are considered topical words. We used Arsenl lexicon for this step.
Surprisingly, the first approach gave better results than the latter one this can be attributed to the high number of OOVs.
- VODUM hyper-parameters: The problem in automating the process of VODUM is the selection of hyper-parameters for every new event. We can either use a pre-defined set of parameters for all the events or try to select them automatically based on the event information:
- for the priors hyper-parameters, we always use the values reported in the paper for all the events
- for the number of viewpoints we used 2 for now across all the events (This is obviously not accurate)
- the number of topics is selected based on the number of articles in the event using simple linear interpolation, in the original paper the authors used 50 events to represents data of nearly 600 articles, so for example for an event of 30 articles we use 3 topics.
The results we observed were not satisfactory even on the manually annotated events, here are the issues with this approach.
Low cluster purity
The view-points generated by this model are not very pure, for small controversial events encompassing less than 30 articles we have observed some good results. However, as the event size starts to grow the 2 viewpoints becomes increasingly noisy. There are 2 main possible causes for error:
The simplicity of the model
VODUM is a very simple unsupervised model and thus we don’t expect its performance to be extremely high. In the original paper, the authors run the system several times for every topic they report a maximum accuracy of 75% across all topics and across all the runs and an average of just above 60%.
Problems in selecting Hyper-parameters
As noted before the selection of Hyper-parameters is done in an automatic manner. The number of view-points and distribution priors are kept the same across all the events, and the number of topics is selected only based on the number of articles in the event.
This simple way of selecting hyper-parameters is not optimal since it ignores the actual content of the articles. We have verified this experimentally by manually optimizing the hyper-parameters of one event “نبع السلام” and while this step did improve the results it can’t be done on production since it requires manual intervention.
One way to select the best parameters of the model is by relying on the perplexity of the model after training. this is the only “confidence-like” value that this algorithm can produce. In theory, we can run a Hyper-parameters optimization process using this metric to guide the selection of hyper-parameters in an automatic manner. However we have chosen not to do this for 2 main reasons:
- Firstly, even if this approach does work the computational overhead of training the topic model (multiple times) and carrying out grid-search to find the parameters with the lowest perplexity is too high, especially if we consider the issue of event retraining (see below).
- Secondly, the perplexity while being an indicator of how pure are the view-points. It can be a noisy indicator. In the previous example of “نبع السلام,” we tried to optimize the parameters only using the perplexity as an indicator. However, for some parameters values with lower perplexities didn’t correspond to improvements in the clustering performance (as measured by manual inspection).
Inability to classify events as controversial:
An important issue is trying to guess if a certain event is controversial or not (have multiple viewpoints).
The algorithm we suggested at the start of this article works only on controversial events. But when considering events where there is no real difference between the coverage of multiple anchors (natural disasters for example) i.e. when all the articles have the same view-point it is impossible to find a “contrary view” to any of the articles.
Therefore, a major first step is to decide if a certain event is controversial or not, and then if it is controversial try to find the different viewpoints in it.
As we discussed above the only indicator of the confidence that we can extract from this model is perplexity. One way to detect non-controversial events is to measure the model perplexity on the articles of this event after training the model. If the perplexity exceeds a certain limit we can say that a model with multiple viewpoints couldn’t fit the data and thus there isn’t multiple viewpoints, i.e. the events with values higher than such a limit can be considered non-controversial.
However, in practice, this approach fails. Mainly because while events that have a better separation between different view do have lower perplexity. Other factors influence the perplexity value including the number of topics in the event, events size and even different runs of the model.
Growing the event
Assume that we start with a small event of say 20 articles and we train a model that can correctly fit this event. The main issue is expanding this model when the event size increases (new articles are published that describe this event).
While retraining the model every say 10 new articles is not computationally extensive. Re-selecting the hyper-parameters properly (in a manual manner) is problematic.
And this requires answering the following questions: when does an article introduce a new topic? Or a new viewpoint? And can we detect this automatically?
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.