Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack (ELK), it centrally stores your data so you can discover the expected and uncover the unexpected. In this post, we’re investigating some features and out of the box use cases for ElasticSearch in the field of NLP. Search Enhancement Features ElasticSearch provides us with a sort of cool stuff to enhance our end-user search experience. You Complete Me Effective search is not just about returning relevant results when a user types in a search phrase, … Continue reading ElasticSearch Out of the Box Use Cases
Almeta News, as a content aggregator app from ~ 50 sources – to the time we wrote this blog post – always aims to provide its users with the best quality pieces to read. Rather than an army of content watchers and editors, Almeta is looking forward to developing the best algorithms to review content automatically, looking for indicators of quality, assessing a content’s placement. This post is a part of our research efforts seeking the best content ranking methodology.In this post, we’re trying to determine the most effective indicators of content quality. Depending on human experts’ points of view, … Continue reading Automatically Extracting Valuable Content from News Streams.
In this post, we are trying to validate our initial solution for the event detection task. If you’re not familiar with the task you can refer to our previous post about “How to” Event Detection in Media using NLP and AI. For applying event detection in news articles we are planning to do the following: Represent each article as a vector of expressive features. Feed the vectorized articles into a sequential clustering model to aggregate the ones talking about the same event. In this post, we’re solving the problem of event detection in the Arabic language. Features Effectiveness In this … Continue reading An Initial, Failed Solution For The Event Detection Task
The ability to filter your news feed based on the genre is a critical component of any news aggregator, users would usually want to read sports or political news only not just the most recent or hottest news. In this post, we will explore in great details our initial genre classification system. Let’s start with the.. data In the following experiments, we used an in-house data set. The data set is composed of 190307 HTML document crawled from the following domains [Aljazira, Alarabia, Aljadeed, RT Arabic, BBC arabic]. For each of the documents we tried to extract the following features: … Continue reading Initial Genre Classification Experiments
Before we start, if you’re not familiar with the Event Detection task in NLP you can refer to our previous post on this topic here. So you’ve built a system to detect events in the media… now what? While building a system is a key step, how the system performs on real-world data has equal importance. We need to know whether it actually works and if we can trust its decisions. So.. we need to evaluate our system before putting it in use. Evaluation is a highly important step in the development of any system type as it allows the … Continue reading Building a Test Collection for Event Detection Systems Evaluation
In a previous post, we talked about “How to” Event Detection in Media using NLP and AI. In another post, we presented the Sequential Clustering. Today we’re introducing an online (sequential) clustering algorithm specialized in aggregating news articles into fine-grained story clusters. Problem Formulation We focus on the clustering of a stream of documents, where the number of clusters is not fixed and learned automatically. We denote by D (potentially infinite) space of documents. We are interested in associating each document with a cluster via the function C(d) ∈ N, which returns the cluster label given a document. For each … Continue reading News Stream Clustering – Sequential Clustering in Action
The increasing amount of text data in the digital age calls for methods to reduce reading time while maintaining information content. The process of summarization achieves this by deleting, generalizing or paraphrasing fragments of the input text to create a … Continue reading Abstractive Summarization in Underresourced Languages
Difference is a fine and beautiful phenomenon. Difference should always be accepted, expected and respected. Difference adds richness to the topics we discuss and opens to everyone new perspectives that they never thought of. Starting from our belief that a … Continue reading Towards Contrary-View Detection in News
In a previous post, we discussed a News Stream Sequential Clustering Algorithm. In this post, we’re discussing the details of implementing this algorithm with minimal tuning, and showing the results produced by this implementation. Along with this post, we’re evaluating … Continue reading An Implementation of a News Stream Sequence Clustering Algorithm
While reading the news each one of us perceives it in a different manner. We have our own biases and we tend to search for information that confirms our previous beliefs. Thus different people might have drastically different viewpoints of … Continue reading Contrary view detection based on VODUM
There have been some recent advancements in Dialectical Arabic processing across various NLP tasks, in this article the goal of this article won’t be to explore any particular task but to explore as many tasks as possible and give an … Continue reading Major Tasks in Dialectical Arabic Processing
Most of us are not good writers, and if you are like me you may have sometimes struggled in communicating your ideas in a written form. The new field of automated paraphrasing can provide a solution to this issue. In … Continue reading Automatic Sentence Paraphrasing
While reading the news you are most likely to encounter several articles that describe the same event or incident and each of these articles comes from a different news anchor and provides a different viewpoint to the event. However most … Continue reading Multi-document summarization. The What, Why and How
Social media presents today a massive source of information for marketers and decision-makers to both better understand users trends and influence these users decision. With the field of AI and NLP conquering various aspects of our day to day life, … Continue reading Smart Services For Social Media Marketing
After reading a news article on your favourite news aggregator or your news site of choice, Most of the current news aggregators allows you to read other articles that are related to the one you have already read, these suggestions … Continue reading Contrary View Detection Based On Document Similarity
Many sites on the internet allow their users to specify tags for their content. The most famous example of such sites is Tumblr where each post on this social network can hold a manually selected set of tags. These tags … Continue reading Auto-Tagging Content with NLP
Yes, understandably you might be thinking is this related to Rick and Morty? Well unfortunately No. But you should really continue reading cause Multidimensional topic modeling is really cool. In this short piece we will explore the fundamental idea behind … Continue reading Multidimensional Topic Modelling. The What? and The How?
In one of our previous articles, we discussed the idea of multi-dimensional topic modelling, and no it is not related to Star Wars, so if you thought it is, go here and give it a good read. Back from Alderaan. … Continue reading Viewpoint, Topic and Opinion Discovery in an Opinionated Document
The rise of political bias problem across several news anchors presents a real threat to free and independent journalism and a major factor in shifting the populace conception of the world. Several NGO’s, research centres and private organizations are working … Continue reading From Sentiment to Political Bias in the Arab World and the Arabic Content
First, a motivational example: Many products on the internet allow the user to leave some feedback. This feedback is usually reviewed manually to figure out what are the users likes or dislikes in the product, what are the features they … Continue reading Aspect-level Vs Entity-level Sentiment Analysis
This article is a part of our series on political bias detection we will hopefully introduce you to the various aspects of our political bias detection system, and you can learn about: How can we predict the political orientation behind … Continue reading Stance Detection – State of the Art
If you don’t know what is stance detection make sure to check our article on it. Are we on the same page? Cool let’s go. First a motivational example: Many products on the internet allow the user to leave some … Continue reading Subjective Stance Detection What is it? and How to do it?
While some news anchors try to stay professional and subjective in all of their articles, most of the news we consume are published to push a specific agenda especially when it comes to politics. In our effort to battle news … Continue reading Political Orientation Detection – AI and NLP Approach
In this task the goal is to assign a given piece of text a tag (or number) representing the level of informativeness or detail this text holds usually by training a model to do that. Here, we rely on the … Continue reading Automatically Tagging Data for Content Informativity Scoring
Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. Yes, it is much simpler to flag an article as spammy … Continue reading How to Rank Articles Based on How Informative They Are – Using Snorkel
In a previous article (see next paragraph) we explored how to approximate an article informativeness in a supervised fashion, such a method would require training data, in this article we will explore on way to get this data, one very … Continue reading Can you measure a text Informativeness using its summary?
Let’s start with a question, given 2 articles A and B that talks about the exact same thing, what makes one of them more informative than another? Is it the ease of reading? the amount of details? or is it … Continue reading Supervised Article Informativeness Prediction – The What and the How
In our effort at Almeta to provide the articles with the highest informative value to the Arabic readers, we have employed several methods to measure the informativeness of a piece of news, in this article we will shed light to … Continue reading Term Informativeness Estimation in the Arabic Language
In a previous article, we talked about the various factor that makes an article more informative, using cliches was not one of them, this article is a part of our research on measuring text informativeness, if you are interested jump … Continue reading How to Detect Cliches in Text
Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. The question of measuring How informative a piece of news is … Continue reading Informativity Detection – Almeta’s Research Gist
The Concept of an informative text is really abstract and it is hard to come up with a definitive formula to measure it, in this article we will explore some of the features that we believe can make an article … Continue reading What Makes an Article Informative – And How Computers Can Measure Informativity of a Text Content
In our effort to provide the best news feed out there, one of the goals we are trying to achieve here at Almeta is to capture the interaction between different news outlets and how the coverage of the same event … Continue reading Aspect Detection and Named Entity Linking (NEL): Using SPARQL and DBpedia
Events possess a rich structure that is important for intelligent information access systems (information retrieval, question answering, summarization, etc.). Without information about what happened, where, and to whom, temporal information about an event may not be very useful. In light … Continue reading An Overview of The Event Extraction Task in NLP
Search engines use indexing to store information about web pages, enabling them to quickly return relevant, high-quality results.Indexing is the process by which search engines organize information before a search to enable super-fast responses to queries. Searching through individual pages … Continue reading Search Service Frameworks Evaluation
Text-to-speech (TTS) is a type of assistive technology that reads digital text aloud. It’s sometimes called “read aloud” technology. text-to-speech applications are offering an innovative solution for users to interact with content by taking it out of books and computer screens and … Continue reading Comparison of Available TTS Services
In this post, we’re analyzing the results returned by the readability metric in our news feed. If you haven’t checked our post about “How to measure the readability of a text?” before, you can read about it here. How Are We Measuring the Readability? The main part of analyzing a metric is to know how does it work. In the current version, we’re depending on the AARIBase metric for measuring the readability. So, let’s have a look first on how does AARIBase work. Here’s the AARIBase formula: AARIBase = (3.28 × NOC) + (1.43 × ACW) + (1.24 × AWS) … Continue reading Analysis of the Readability Metric Results in Almeta News Feed
Readability is the ease with which a reader can understand a written text, which accordingly indicates how effectively the text will reach the target audience. The readability of text depends on its content (the complexity of its vocabulary and syntax), … Continue reading How to Measure Text Readability?
In a previous article, How to Detect Clickbait Headlines using NLP? We introduced the task of clickbait detection and explored how it can be modeled within the domain of machine learning and NLP. If you are not familiar with the concept of clickbait detection, make sure to review it before continuing. In this post, we’re building a classifier for clickbait detection in the news headlines depending on a pre-trained Arabic Word2Vec model and we’re validating this solution. If you are not familiar with the Word2Vec concept you can refer to this Wikipedia article for more information. News Headlines Representation In … Continue reading Clickbait Detection Using Word2Vec Representation
Clickbait is a type of hyperlink on a web page that has catchy or provocative headlines difficult for most users to resist, they tell you exactly what you’re about to see, with just enough of a tease at the end … Continue reading How to Detect Clickbait Headlines using NLP?
In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools. Before start if you are not familiar with the term AutoML you can refer to our previous post on this topic. Who is Google AutoML for? and When to Use It? The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning. The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API … Continue reading Google’s AutoML Overview
In this article, we present the summary of our research in the field of fact-checking. We categorized them in two categories, first are the closed source published applications and the second are the research projects done in this field. Closed Source Snobs Their methodology depends on human annotators to fact check a piece of the news and present a detailed report regarding the inaccuracies in the article Reporters’ Lab Their methodology depends on human annotators as well, and dataset can be found in https://www.politifact.com/texas/ and https://factnameh.com/ Fullfact Their methodology builds a fully automated fact checker, but no details are provided … Continue reading How to Fact-Check using Natural Language Processing Techniques? A Literature Review
News stories are created every day at many news agencies. Users may receive news streams from multiple sources. Browsing in large-scale information spaces without guidance is not effective. Suppose, for example, a person who has returned from a long vacation … Continue reading Event Detection in Media using NLP and AI
“Machine learning and natural language are the foundation to any AI system, just in the ability to communicate with us in a human way and to automate that learning process, what you build on top of that, whether it’s predictive, … Continue reading Top 3 Exciting Ideas in NLP in 2018
We have mentioned in previous blogs the significance of NLP and the wide range of applications where NLP is used. As the basic goal of NLP is to ease and simplify the communication between machines and humans, it is highly … Continue reading Biggest Challenges in Arabic Natural Language Processing
When was the last time you asked your Siri or Alexa to do something and they did not understand what you are saying? or they answered with something totally not related? Siri and Alexa are speech bots that rely basically … Continue reading 4 Biggest Open Problems in NLP