ElasticSearch Out of the Box Use Cases

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack (ELK), it centrally stores your data so you can discover the expected and uncover the unexpected. In this post, we’re investigating some features and out of the box use cases for ElasticSearch in the field of NLP. Search Enhancement Features ElasticSearch provides us with a sort of cool stuff to enhance our end-user search experience. You Complete Me Effective search is not just about returning relevant results when a user types in a search phrase, … Continue reading ElasticSearch Out of the Box Use Cases

SNS & Async Lambda VS. API Gateway as Lambda Triggers

AWS Lambda integrates with other AWS services to invoke functions. You can configure triggers to invoke a function in response to resource lifecycle events, respond to incoming HTTP requests, consume events from a queue, or run on a schedule, and so on. In this post, we’re discussing two event sources of Lambda: AWS SNS events. Asynchronously calls by other Lambda. And comparing them to the APIGW (API Gateway). SNS (Simple Notifications Service) When a message is published to an SNS topic that has a Lambda function subscribed to it, the Lambda function is invoked with the payload of the published … Continue reading SNS & Async Lambda VS. API Gateway as Lambda Triggers

Visualization Platforms for Data Search — Amplitude VS Kibana

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions. Visualization is an increasingly key tool to make sense of the trillions of rows of data generated every day. Our eyes are drawn to colors and patterns. We can quickly identify red from blue, square from circle. Our culture is visual, including everything from art and advertisements to TV and movies. Data visualization is another form of visual art that grabs our interest and keeps our eyes on the message. Hence, starting from the data value in … Continue reading Visualization Platforms for Data Search — Amplitude VS Kibana

RomCom — The Personalized Recommendation System For Almeta

The nice thing about working in Almeta, is that we are our own users. As a platform for the best Arabic content on the web, we want to deliver the best content for each user. For us. For each Arabic-speaking reader. As usual, many company think of “personalization” as the way to go to improve engagement, reach, or even acquisition. Just let the user tell you what he wants, fit a model against his needs, and let him see what he wants to see. Not what he should see. We tend to disagree. Given our own unique thoughts, interests and … Continue reading RomCom — The Personalized Recommendation System For Almeta

Communication/Messaging Tools and Patterns between Microservices

When it is designed, microservice, required to remember that other services will need to integrate with it. There is no general best style of communication that should be used. In practice, we need to find the best solutions for the problem at hand. In this post, we’re discussing different approaches and technologies used in designing the communication between microservices, shading light on the most common communication services provided by AWS, trying to make preferences towards the different patterns and services taking different factors into the consideration. Communication Patterns In this section, let’s introduce you to the two major communication patterns … Continue reading Communication/Messaging Tools and Patterns between Microservices

Automatically Extracting Valuable Content from News Streams.

Almeta News, as a content aggregator app from ~ 50 sources – to the time we wrote this blog post – always aims to provide its users with the best quality pieces to read. Rather than an army of content watchers and editors, Almeta is looking forward to developing the best algorithms to review content automatically, looking for indicators of quality, assessing a content’s placement. This post is a part of our research efforts seeking the best content ranking methodology.In this post, we’re trying to determine the most effective indicators of content quality. Depending on human experts’ points of view, … Continue reading Automatically Extracting Valuable Content from News Streams.

Initial Genre Classification Experiments

The ability to filter your news feed based on the genre is a critical component of any news aggregator, users would usually want to read sports or political news only not just the most recent or hottest news. In this post, we will explore in great details our initial genre classification system. Let’s start with the.. data In the following experiments, we used an in-house data set. The data set is composed of 190307 HTML document crawled from the following domains [Aljazira, Alarabia, Aljadeed, RT Arabic, BBC arabic]. For each of the documents we tried to extract the following features: … Continue reading Initial Genre Classification Experiments

Intial Experiments on Measuring Informativity of an Arabic Content – Data Collection

In one of our previous articles we suggested a method to build an initial system for informativeness detection, this system should utilize a small set of pairwise comparisons manually annotated and use Snorkel to expand these annotations automatically to a larger training set and then train the model to estimate the article informativeness using this set.In this article, we will go into the details of the implementation of this plan. Data Annotation As noted above Snorkel will need 3 typed of training data: A small manually annotated test set to evaluate the results of the model A smaller manually annotated … Continue reading Intial Experiments on Measuring Informativity of an Arabic Content – Data Collection

What is Political Bias? – In Technical Terms

In this article, we will review all the researches done in the field of discovering political bias. Understanding Characteristics of Biased Sentences in News Articles Methodology Bias Labeling via Crowd-Sourcing They used crowdsourcing to collect bias labels using “Figure Eight” platform. In crowdsourcing they let the workers make judgements on each target news article (using also the reference news article). Analysis of Perceived News Bias To analyze what kind of words are tagged as bias triggers by the workers: they analyze the phrases annotated as biased in terms of the word length (4 words in a sentence have been annotated). … Continue reading What is Political Bias? – In Technical Terms

News Stream Clustering – Sequential Clustering in Action

In a previous post, we talked about “How to” Event Detection in Media using NLP and AI. In another post, we presented the Sequential Clustering. Today we’re introducing an online (sequential) clustering algorithm specialized in aggregating news articles into fine-grained story clusters. Problem Formulation We focus on the clustering of a stream of documents, where the number of clusters is not fixed and learned automatically. We denote by D (potentially infinite) space of documents. We are interested in associating each document with a cluster via the function C(d) ∈ N, which returns the cluster label given a document. For each … Continue reading News Stream Clustering – Sequential Clustering in Action

Available Visualization Libraries to Handle Stream of Data

Google Analytics Google Analytics generates detailed statistics and fresh insights into your website’s traffic and traffic sources. With Google Analytics users can track visitors from all referrers, including search engines and social networks, direct visits and referring sites. It also tracks and monitors display advertising, PPC networks, email marketing and other digital collateral. You can not only measure sales and conversions, but also gain fresh insights into how visitors use your site and how you can keep them coming back. Segment From startups to the Fortune 500, thousands of companies use Segment as their customer data hub. We believe that … Continue reading Available Visualization Libraries to Handle Stream of Data

Contrary view detection based on VODUM

While reading the news each one of us perceives it in a different manner. We have our own biases and we tend to search for information that confirms our previous beliefs. Thus different people might have drastically different viewpoints of … Continue reading Contrary view detection based on VODUM

AWS Batch Jobs — An Overview

AWS Batch enables you to run batch computing workloads on the AWS Cloud. This service can automatically provision compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. Related Definitions Jobs: A unit of work (such as a shell script, a Linux executable, or a Docker container image) that you submit to AWS Batch. It runs as a containerized application on an Amazon EC2 instance in your computing environment, using parameters that you specify in a job definition. Container images are stored in and pulled from container registries. Job Definitions: specifies how jobs are … Continue reading AWS Batch Jobs — An Overview

AWS Lambda and SQS Payload Limitation

In this post, we’re talking about deploying an AI service that processes the political news articles in Almeta’s database. The service is assumed to be deployed as AWS Lambda function, with the use of AWS SQS to maintain the incoming requests while the function is throttled. An important limit to be considered in such a situation, is the limit that is put on the payload size by both AWS Lambda and AWS SQS. Along this post, we investigate what is this limit for each of the two services and how are we affected by it. AWS SQS There are two … Continue reading AWS Lambda and SQS Payload Limitation

A Guideline for Writing Research/Tech Blogs

Intro In Almeta you have to write a lot for those research tickets you have in a Sprint. You’ve to read tons of research, academic, and sometimes boring paper. But, when you write your proposal, you don’t have to write like them. As a matter of fact we want to be as close to non-techies as possible when writing our tech blogs. So, you’re an engineer and you love to code. You are a machine learning engineer and you love to read. You’re both and here comes a research/investigation ticket. You read, read, and read some more and now comes … Continue reading A Guideline for Writing Research/Tech Blogs

How to Fact-Check using Natural Language Processing Techniques? A Literature Review

In this article, we present the summary of our research in the field of fact-checking. We categorized them in two categories, first are the closed source published applications and the second are the research projects done in this field. Closed Source Snobs Their methodology depends on human annotators to fact check a piece of the news and present a detailed report regarding the inaccuracies in the article Reporters’ Lab Their methodology depends on human annotators as well, and dataset can be found in https://www.politifact.com/texas/ and https://factnameh.com/ Fullfact Their methodology builds a fully automated fact checker, but no details are provided … Continue reading How to Fact-Check using Natural Language Processing Techniques? A Literature Review