ElasticSearch Out of the Box Use Cases

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack (ELK), it centrally stores your data so you can discover the expected and uncover the unexpected. In this post, we’re investigating some features and out of the box use cases for ElasticSearch in the field of NLP. Search Enhancement Features ElasticSearch provides us with a sort of cool stuff to enhance our end-user search experience. You Complete Me Effective search is not just about returning relevant results when a user types in a search phrase, … Continue reading ElasticSearch Out of the Box Use Cases

SNS & Async Lambda VS. API Gateway as Lambda Triggers

AWS Lambda integrates with other AWS services to invoke functions. You can configure triggers to invoke a function in response to resource lifecycle events, respond to incoming HTTP requests, consume events from a queue, or run on a schedule, and so on. In this post, we’re discussing two event sources of Lambda: AWS SNS events. Asynchronously calls by other Lambda. And comparing them to the APIGW (API Gateway). SNS (Simple Notifications Service) When a message is published to an SNS topic that has a Lambda function subscribed to it, the Lambda function is invoked with the payload of the published … Continue reading SNS & Async Lambda VS. API Gateway as Lambda Triggers

Visualization Platforms for Data Search — Amplitude VS Kibana

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions. Visualization is an increasingly key tool to make sense of the trillions of rows of data generated every day. Our eyes are drawn to colors and patterns. We can quickly identify red from blue, square from circle. Our culture is visual, including everything from art and advertisements to TV and movies. Data visualization is another form of visual art that grabs our interest and keeps our eyes on the message. Hence, starting from the data value in … Continue reading Visualization Platforms for Data Search — Amplitude VS Kibana

RomCom — The Personalized Recommendation System For Almeta

The nice thing about working in Almeta, is that we are our own users. As a platform for the best Arabic content on the web, we want to deliver the best content for each user. For us. For each Arabic-speaking reader. As usual, many company think of “personalization” as the way to go to improve engagement, reach, or even acquisition. Just let the user tell you what he wants, fit a model against his needs, and let him see what he wants to see. Not what he should see. We tend to disagree. Given our own unique thoughts, interests and … Continue reading RomCom — The Personalized Recommendation System For Almeta

Communication/Messaging Tools and Patterns between Microservices

When it is designed, microservice, required to remember that other services will need to integrate with it. There is no general best style of communication that should be used. In practice, we need to find the best solutions for the problem at hand. In this post, we’re discussing different approaches and technologies used in designing the communication between microservices, shading light on the most common communication services provided by AWS, trying to make preferences towards the different patterns and services taking different factors into the consideration. Communication Patterns In this section, let’s introduce you to the two major communication patterns … Continue reading Communication/Messaging Tools and Patterns between Microservices

Automatically Extracting Valuable Content from News Streams.

Almeta News, as a content aggregator app from ~ 50 sources – to the time we wrote this blog post – always aims to provide its users with the best quality pieces to read. Rather than an army of content watchers and editors, Almeta is looking forward to developing the best algorithms to review content automatically, looking for indicators of quality, assessing a content’s placement. This post is a part of our research efforts seeking the best content ranking methodology.In this post, we’re trying to determine the most effective indicators of content quality. Depending on human experts’ points of view, … Continue reading Automatically Extracting Valuable Content from News Streams.

An Initial, Failed Solution For The Event Detection Task

In this post, we are trying to validate our initial solution for the event detection task. If you’re not familiar with the task you can refer to our previous post about “How to” Event Detection in Media using NLP and AI. For applying event detection in news articles we are planning to do the following: Represent each article as a vector of expressive features. Feed the vectorized articles into a sequential clustering model to aggregate the ones talking about the same event. In this post, we’re solving the problem of event detection in the Arabic language. Features Effectiveness In this … Continue reading An Initial, Failed Solution For The Event Detection Task

Building a Test Collection for Event Detection Systems Evaluation

Before we start, if you’re not familiar with the Event Detection task in NLP you can refer to our previous post on this topic here. So you’ve built a system to detect events in the media… now what? While building a system is a key step, how the system performs on real-world data has equal importance. We need to know whether it actually works and if we can trust its decisions. So.. we need to evaluate our system before putting it in use. Evaluation is a highly important step in the development of any system type as it allows the … Continue reading Building a Test Collection for Event Detection Systems Evaluation

News Stream Clustering – Sequential Clustering in Action

In a previous post, we talked about “How to” Event Detection in Media using NLP and AI. In another post, we presented the Sequential Clustering. Today we’re introducing an online (sequential) clustering algorithm specialized in aggregating news articles into fine-grained story clusters. Problem Formulation We focus on the clustering of a stream of documents, where the number of clusters is not fixed and learned automatically. We denote by D (potentially infinite) space of documents. We are interested in associating each document with a cluster via the function C(d) ∈ N, which returns the cluster label given a document. For each … Continue reading News Stream Clustering – Sequential Clustering in Action

How to Version Control Your Machine Learning? – A Look into Data Version Control (DVC)

If you have spent time working with Machine Learning, one thing is clear: it’s an iterative process. Machine learning is about rapid experimentation and iteration, each experiment consists of different parts: the data you use, hyperparameters, learning algorithm, architecture, and the optimal combination of all of those Throughout this iterative process, your accuracy on your dataset will vary accordingly, and without keeping track of your experimenting history you won’t be able to learn much. Versioning lets you keep track of all of your experiments and their different components. How to Version Control ML Projects? One of the most popular ways … Continue reading How to Version Control Your Machine Learning? – A Look into Data Version Control (DVC)

An Implementation of a News Stream Sequence Clustering Algorithm

In a previous post, we discussed a News Stream Sequential Clustering Algorithm. In this post, we’re discussing the details of implementing this algorithm with minimal tuning, and showing the results produced by this implementation. Along with this post, we’re evaluating … Continue reading An Implementation of a News Stream Sequence Clustering Algorithm

AWS Batch Jobs — An Overview

AWS Batch enables you to run batch computing workloads on the AWS Cloud. This service can automatically provision compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. Related Definitions Jobs: A unit of work (such as a shell script, a Linux executable, or a Docker container image) that you submit to AWS Batch. It runs as a containerized application on an Amazon EC2 instance in your computing environment, using parameters that you specify in a job definition. Container images are stored in and pulled from container registries. Job Definitions: specifies how jobs are … Continue reading AWS Batch Jobs — An Overview

AWS Lambda and SQS Payload Limitation

In this post, we’re talking about deploying an AI service that processes the political news articles in Almeta’s database. The service is assumed to be deployed as AWS Lambda function, with the use of AWS SQS to maintain the incoming requests while the function is throttled. An important limit to be considered in such a situation, is the limit that is put on the payload size by both AWS Lambda and AWS SQS. Along this post, we investigate what is this limit for each of the two services and how are we affected by it. AWS SQS There are two … Continue reading AWS Lambda and SQS Payload Limitation

Almeta App — Caricature Tab

Caricature Tab is a new incoming feature planned to be a part of the Almeta News app soon. In its primary version, the feature will provide the user with a stack of in-house designed Caricature images to enjoy browsing. If you’re curious about how we in Almeta manage to handle such new features, then you will discover this in this post. In this post, we’re showing the entire process towards making decisions to answer a bunch of design and deployment related questions: Where to store the images? How to handle new ones? Do we need caching? What to store in … Continue reading Almeta App — Caricature Tab

Analysis of the Readability Metric Results in Almeta News Feed

In this post, we’re analyzing the results returned by the readability metric in our news feed. If you haven’t checked our post about “How to measure the readability of a text?” before, you can read about it here. How Are We Measuring the Readability? The main part of analyzing a metric is to know how does it work. In the current version, we’re depending on the AARIBase metric for measuring the readability. So, let’s have a look first on how does AARIBase work. Here’s the AARIBase formula: AARIBase = (3.28 × NOC) + (1.43 × ACW) + (1.24 × AWS) … Continue reading Analysis of the Readability Metric Results in Almeta News Feed

Clickbait Detection Using Word2Vec Representation

In a previous article, How to Detect Clickbait Headlines using NLP? We introduced the task of clickbait detection and explored how it can be modeled within the domain of machine learning and NLP. If you are not familiar with the concept of clickbait detection, make sure to review it before continuing. In this post, we’re building a classifier for clickbait detection in the news headlines depending on a pre-trained Arabic Word2Vec model and we’re validating this solution. If you are not familiar with the Word2Vec concept you can refer to this Wikipedia article for more information. News Headlines Representation In … Continue reading Clickbait Detection Using Word2Vec Representation

Google’s AutoML Overview

In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools. Before start if you are not familiar with the term AutoML you can refer to our previous post on this topic. Who is Google AutoML for? and When to Use It? The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning. The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API … Continue reading Google’s AutoML Overview