Supervised Article Informativeness Prediction - The What and the How

Supervised Article Informativeness Prediction – The What and the How

Let’s start with a question, given 2 articles A and B that talks about the exact same thing, what makes one of them more informative than another?

Is it the ease of reading? the amount of details? or is it the correct structuring of the text?

Well, maybe. In our previous article we have explored various metrics like this and saw how these metrics can be calculated quantitatively.

But let us explore a different assumption: Here we are assuming that the informativity of a piece of content (text) is an abstract concept that is hard to bin down exactly and that we can measure it using common sense just like the coherence of a text or a musical composition.

These common sense rules must have been constructed in our mind through years of experience. If this assumption is valid, wouldn’t it be possible to transfer this experience to an information systems? In this article we are exploring exactly this possibility.

This article is a part of our research on informativeness, and we have A LOT of it. Uf you are interested in more details, you can check each individual piece to learn about:

The What

In this task, the goal is to assign a given piece of text a tag (or a number) representing the level of informativeness or detail this text holds usually by training a model to do that.
The variations in this task is based on the size of the text (article, paragraph, sentence) and the system architecture (supervised vs unsupervised), another name for this task is Specificity Prediction.

The How

Usually the methodologies are split based on the size of text that the model processes and we can categorize them into a paragraph or a sentence level.

Paragraph Level

These methodologies will yield a model that can process large chunks of text at once, which is suitable for demanding applications.

Using Lead Paragraphs

In [1] the authors relay on the intuitive idea that usually in any news article the lead paragraph can be used as a good summary especially if the author utilizes the inverted pyramid style, an example of a good lead paragraph looks something like this:

“The European Union’s chief trade negotiator, Peter Mandelson, urged the United States on Monday to reduce subsidies to its farmers and to address unsolved issues on the trade-in services to avert a breakdown in global trade talks. Ahead of a meeting with President Bush on Tuesday, Mr. Mandelson said the latest round of trade talks, begun in Doha, Qatar, in
2001, are at a crucial stage. He warned of a ”serious potential breakdown” if rapid progress is not made in the coming months.”

However, this generalization does not apply to all authors as some uses a more creative approach, e.g.:

“ART consists of limitation,” G. K. Chesterton said.” The most beautiful part of every picture is the frame.” Well, put, although the buyer of the latest multi-million dollar Picasso may not agree. But there are pictures – whether sketches on paper or oils on canvas – that may look like nothing but scratch marks or listless piles of paint when you bring them home from the auction house or dealer. But with the addition of the perfect frame, these works of art may glow or gleam or rustle or whatever their makers intended them to do.

It is clear that the first lead paragraph is more informative than the second: i.e. first > second.

The authors relied on the idea that leads like the first example constitute a more plausible summary of the article than the second example and automatically created a corpora for informativeness prediction using a corpora of human summarized news articles simply by comparing the human generated summary with the lead paragraph and assigning the lead paragraph a tag of (informative/ creative) based on this similarity.

After generating the dataset they trained an SVM on several features including some psycholinguistic attributes such as the age of acquisition, imagery, concreteness, familiarity and ambiguity of each word.

They report a binary accuracy between 0.7 and 0.8 across the different genres of news articles and indicate that genre-specific models have a high advantage over general ones.

Using Extractive Summaries

In [2], the authors carry out a similar task to [1], they use extractive summaries and label every paragraph in the original texts as either positive (if it appears in the summary) or unlabeled (if it does not appear in the summary).

The second class is unlabeled because of the fact that annotators bias does not allow for faithful labelling of out-of-summary paragraphs as unimportant.

Therefore the authors use a simple semi-supervised model (logistic regression) and very simple features similar to [1] (only lexicon features) their reported results on the DUC2002 dataset and on hand labeled set from NYT is around 0.69 f1 score however their recall is nearly 0.9 in both sets while their accuracy is as low as 0.59 this indicate the simplicity of
the used features and the inability of the model to fit the training data.

Available Resources

This is a comprehensive list of English summarization corpora for section 2.1, Arabic summarization datasets include EASC, DUC2004/2001/2002, and Kalimat which would be the closest to the proposed method.

Please check this article for an in-depth evaluation of these resources and this one for our results of the implementation of the aforementioned task on Arabic Language.

Sentence level (sentence specificity)

This similar task is based on the idea that more specific (detailed) sentences are more informative e.g.

This brand is very popular and many people use its products regularly.

Mascara is the most commonly worn cosmetic, and women will spend an average of $4,000 on it in their lifetimes.

There are several applications of this from summarization to IR. Things like argument detection … but most importantly in essay evaluation:

  • The task was originally introduced in [3] where the authors used the pdtb dataset to generate their corpora and trained on it a binary classifier based on several lexical features including:
    • Sentence length (shorter sentences tend to be more specific)
    • Average polarity using a lexicon (English MPQA)
    • The average estimate of specificity based on hypernym relations between words using WordNet
    • Lexical counts of (numbers, money amounts, dates, …) as they can serve as an indicator of specificity as well
    • n-grams and pos

They report an accuracy of 76% on instantiation discord relations (when a certain thing is an instance of a larger concept) and 60% on specification relations (when one concept is a special case of another concept)

  • In [4] the authors used the same dataset as above but utilized simpler features including:
    • Counts: number of words in the sentence, number of numbers, capital letters and nonalphanumeric symbols in the sentence, average words length and number of stop words
    • IDF of the words
    • Lexicon features the same ones used in [3]
    • Psycholinguistic features from [2]
    • Counts vectors and Word2Vec

Furthermore the authors co-train their model using a larger un-annotated dataset from WSJ, these new improvements pushes the model performance up to 81% accuracy, 79% F1.

  • Finally, in [5] the authors propose a neural model based on LSTM running on words embeddings and hand-crafted features from [3]. This model is trained on news data from [3] and [4]. However the authors provide a method for unsupervised domain adaptation. They hand labelled using MTruck 3 sets from yelp, Twitter, and movie reviews (which is very different than news articles) and tested their method on them. Both the target sets (yelp, twitter and movie reviews) as well as the the code is available on Github.


  • Overall this task is highly correlated with the goal of informativeness score as this is what it measures explicitly (especially in the cases [1] and [2])
  • the implementation of [1] / [2] for both Arabic and English is viable as datasets are available to start with in the 2 languages and that the proposed methods are fairly simple.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.


[1] Y. Yang and A. Nenkova, “Detecting information-dense texts in multiple news domains,” in Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.

[2] Y. Yang, F. S. Bao, and A. Nenkova, “Detecting (un) important content for single-document news summarization,” ArXiv Prepr. ArXiv170207998, 2017.

[3] A. Louis and A. Nenkova, “Automatic identification of general and specific sentences by leveraging discourse annotations,” in Proceedings of 5th International Joint Conference on Natural Language Processing, 2011, pp. 605–613.

[4] J. J. Li and A. Nenkova, “Fast and accurate prediction of sentence specificity,” in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

[5] W.-J. Ko, G. Durrett, and J. J. Li, “Domain agnostic real-valued specificity prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 6610–6617.

[6] J. J. Li, B. O’Daniel, Y. Wu, W. Zhao, and A. Nenkova, “Improving the annotation of sentence specificity,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016, pp. 3921–3927.

[7] L. Lugini and D. Litman, “Predicting specificity in classroom discussion,” ArXiv Prepr. ArXiv190901462, 2019.

[8] J. Li, “From Discourse Structure To Text Specificity: Studies Of Coherence Preferences,” 2017.

Leave a Reply

Your email address will not be published. Required fields are marked *