Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack (ELK), it centrally stores your data so you can discover the expected and uncover the unexpected. In this post, we’re investigating some features and out of the box use cases for ElasticSearch in the field of NLP. Search Enhancement Features ElasticSearch provides us with a sort of cool stuff to enhance our end-user search experience. You Complete Me Effective search is not just about returning relevant results when a user types in a search phrase, … Continue reading ElasticSearch Out of the Box Use Cases
Almeta News, as a content aggregator app from ~ 50 sources – to the time we wrote this blog post – always aims to provide its users with the best quality pieces to read. Rather than an army of content watchers and editors, Almeta is looking forward to developing the best algorithms to review content automatically, looking for indicators of quality, assessing a content’s placement. This post is a part of our research efforts seeking the best content ranking methodology.In this post, we’re trying to determine the most effective indicators of content quality. Depending on human experts’ points of view, … Continue reading Automatically Extracting Valuable Content from News Streams.
In this post, we are trying to validate our initial solution for the event detection task. If you’re not familiar with the task you can refer to our previous post about “How to” Event Detection in Media using NLP and AI. For applying event detection in news articles we are planning to do the following: Represent each article as a vector of expressive features. Feed the vectorized articles into a sequential clustering model to aggregate the ones talking about the same event. In this post, we’re solving the problem of event detection in the Arabic language. Features Effectiveness In this … Continue reading An Initial, Failed Solution For The Event Detection Task
Before we start, if you’re not familiar with the Event Detection task in NLP you can refer to our previous post on this topic here. So you’ve built a system to detect events in the media… now what? While building a system is a key step, how the system performs on real-world data has equal importance. We need to know whether it actually works and if we can trust its decisions. So.. we need to evaluate our system before putting it in use. Evaluation is a highly important step in the development of any system type as it allows the … Continue reading Building a Test Collection for Event Detection Systems Evaluation
If you have spent time working with Machine Learning, one thing is clear: it’s an iterative process. Machine learning is about rapid experimentation and iteration, each experiment consists of different parts: the data you use, hyperparameters, learning algorithm, architecture, and the optimal combination of all of those Throughout this iterative process, your accuracy on your dataset will vary accordingly, and without keeping track of your experimenting history you won’t be able to learn much. Versioning lets you keep track of all of your experiments and their different components. How to Version Control ML Projects? One of the most popular ways … Continue reading How to Version Control Your Machine Learning? – A Look into Data Version Control (DVC)
In a previous post, we discussed a News Stream Sequential Clustering Algorithm. In this post, we’re discussing the details of implementing this algorithm with minimal tuning, and showing the results produced by this implementation. Along with this post, we’re evaluating … Continue reading An Implementation of a News Stream Sequence Clustering Algorithm
In a previous post, we talked in detail about Test Driven Development (TDD) its main methodology, benefits, pitfalls, and best practices. According to the major differences between ML-based code and traditional programming, in this post, we’re discussing the applicability of … Continue reading Test Driven Machine Learning
Many sites on the internet allow their users to specify tags for their content. The most famous example of such sites is Tumblr where each post on this social network can hold a manually selected set of tags. These tags … Continue reading Auto-Tagging Content with NLP
One of the services we provide at Almeta is estimating the political bias of a piece of the news in other pieces we have discussed the technical details of this feature but in this piece, we will go through the … Continue reading How to Visualize a Political Bias Data Metric
This is the first article from our series on political bias detection we will hopefully introduce you to the various aspects of our political bias detection system, and you can learn about: How can we predict the political orientation behind … Continue reading What Constitutes a “Bad” News Article?
Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. Yes, it is much simpler to flag an article as spammy … Continue reading How to Rank Articles Based on How Informative They Are – Using Snorkel
Events possess a rich structure that is important for intelligent information access systems (information retrieval, question answering, summarization, etc.). Without information about what happened, where, and to whom, temporal information about an event may not be very useful. In light … Continue reading An Overview of The Event Extraction Task in NLP
In this post, we’re analyzing the results returned by the readability metric in our news feed. If you haven’t checked our post about “How to measure the readability of a text?” before, you can read about it here. How Are We Measuring the Readability? The main part of analyzing a metric is to know how does it work. In the current version, we’re depending on the AARIBase metric for measuring the readability. So, let’s have a look first on how does AARIBase work. Here’s the AARIBase formula: AARIBase = (3.28 × NOC) + (1.43 × ACW) + (1.24 × AWS) … Continue reading Analysis of the Readability Metric Results in Almeta News Feed
Readability is the ease with which a reader can understand a written text, which accordingly indicates how effectively the text will reach the target audience. The readability of text depends on its content (the complexity of its vocabulary and syntax), … Continue reading How to Measure Text Readability?
Clickbait is a type of hyperlink on a web page that has catchy or provocative headlines difficult for most users to resist, they tell you exactly what you’re about to see, with just enough of a tease at the end … Continue reading How to Detect Clickbait Headlines using NLP?
In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools. Before start if you are not familiar with the term AutoML you can refer to our previous post on this topic. Who is Google AutoML for? and When to Use It? The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning. The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API … Continue reading Google’s AutoML Overview
News stories are created every day at many news agencies. Users may receive news streams from multiple sources. Browsing in large-scale information spaces without guidance is not effective. Suppose, for example, a person who has returned from a long vacation … Continue reading Event Detection in Media using NLP and AI