Fetching And Indexing Tweets in AWS ElasticSearch

Here in Almeta we are striving to be a source of all valuable content on the Arabic side of the web, and while our current effort focuses on indexing and analysing news articles or blogsposts we believe that social outlets remain a vital part of the while news experience of any internet user. In this post we present our plan to fetch, analyze and index tweets in a simple and effecient manner. Twitter API Twitter have 2 major free APIs (asynchronous and streaming) with the following main differences between them: Functionaity: async API allows you to search, filter and inspect … متابعة قراءة Fetching And Indexing Tweets in AWS ElasticSearch

User Profiling Using AWS ElasticSearch – RomCom use case.

Personal differences and preferences marks a very important part of our identity, and optimizing the user experiences based on them can be a great tool in improving users engagement. In our previous post to tackled the issue of personalized recommendations and how can ElasticSearch make the process extreemly simpler. However in order to build a robust personal recomendation system it is paramount to have an idea of each user. Who are they and what do they like. This is commonly refered to as a user profile. In this post we will present a road map to enabling user profiling with … متابعة قراءة User Profiling Using AWS ElasticSearch – RomCom use case.

Visualization Platforms for Data Search — Amplitude VS Kibana

In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions. Visualization is an increasingly key tool to make sense of the trillions of rows of data generated every day. Our eyes are drawn to colors and patterns. We can quickly identify red from blue, square from circle. Our culture is visual, including everything from art and advertisements to TV and movies. Data visualization is another form of visual art that grabs our interest and keeps our eyes on the message. Hence, starting from the data value in … متابعة قراءة Visualization Platforms for Data Search — Amplitude VS Kibana

كيف تستخدم الحملات الانتخابية التسويق في التلاعب بالنتائج؟

في مقابلات مع جريدة The Guardian وجريدة New York Times، مع كريستوفر ويلي، مدير الأبحاث السابق في Cambridge Analytica، شركة علوم البيانات التي استخدمها فريق الرقمي للرئيس الأمريكي Trump أثناء الحملة الانتخابية، أكد ويلي أن شركته أخذت بيانات من ملايين … متابعة قراءة كيف تستخدم الحملات الانتخابية التسويق في التلاعب بالنتائج؟

RomCom — The Personalized Recommendation System For Almeta

The nice thing about working in Almeta, is that we are our own users. As a platform for the best Arabic content on the web, we want to deliver the best content for each user. For us. For each Arabic-speaking reader. As usual, many company think of “personalization” as the way to go to improve engagement, reach, or even acquisition. Just let the user tell you what he wants, fit a model against his needs, and let him see what he wants to see. Not what he should see. We tend to disagree. Given our own unique thoughts, interests and … متابعة قراءة RomCom — The Personalized Recommendation System For Almeta

Communication/Messaging Tools and Patterns between Microservices

When it is designed, microservice, required to remember that other services will need to integrate with it. There is no general best style of communication that should be used. In practice, we need to find the best solutions for the problem at hand. In this post, we’re discussing different approaches and technologies used in designing the communication between microservices, shading light on the most common communication services provided by AWS, trying to make preferences towards the different patterns and services taking different factors into the consideration. Communication Patterns In this section, let’s introduce you to the two major communication patterns … متابعة قراءة Communication/Messaging Tools and Patterns between Microservices

الأخبار المزيفة

كيف غيرت وسائل التواصل الاجتماعي استهلاكنا للأخبار؟

إنه عصر “الأخبار المزيفة”، وقد ولّت الأيام التي تنتظر فيها الجرائد الصباحية للحصول على الأخبار العاجلة أو قراءة مجلات الإشاعات وأخبار المشاهير والقيل والقال.  جميع المعلومات التي نحتاجها متوفرة لدينا بلمسة على تطبيق الأخبار، ومعظم الناس يحصلون الآن على الأخبار … متابعة قراءة كيف غيرت وسائل التواصل الاجتماعي استهلاكنا للأخبار؟

Automatically Extracting Valuable Content from News Streams.

Almeta News, as a content aggregator app from ~ 50 sources – to the time we wrote this blog post – always aims to provide its users with the best quality pieces to read. Rather than an army of content watchers and editors, Almeta is looking forward to developing the best algorithms to review content automatically, looking for indicators of quality, assessing a content’s placement. This post is a part of our research efforts seeking the best content ranking methodology.In this post, we’re trying to determine the most effective indicators of content quality. Depending on human experts’ points of view, … متابعة قراءة Automatically Extracting Valuable Content from News Streams.

تقنيات الذكاء الاصطناعي بدون انحياز بشري

كل خبر عربي تقرأه على تطبيق أخبار الميتا بما لا يتجاوز مئة كلمة! فكيف تقوم الميتا بهذا؟ بالطبع لا نجند أشخاصاً لترقب نشرات الأخبار الجديدة وتلخيصها آنياً! إنما يعود الفضل إلى خوارزميات الذكاء الصنعي التي تتولى الأمر آلياً وآنياً. تعمل … متابعة قراءة تقنيات الذكاء الاصطناعي بدون انحياز بشري

الرسائل المموهة والخفية في السياسة

منذ بضعت أيام، شاهدت عرضاً تقديمياً عن استخدام الرسائل المموهة أو الخفية في التواصل، وقد أثار انتباهي كيف يستخدم الإعلام والسياسيين هذه الرسائل للتحكم بعقول الناس ودس أفكار معينة لأهداف سياسية أو تجارية .  ستجد مع المقال إعلان تلفزيوني مدته … متابعة قراءة الرسائل المموهة والخفية في السياسة

مع النّساء ضد التضليل الإعلامي، الأخبار المزيفة والعنف عبر الانترنت

النائبة الأوكرانية عندما ألقت النائبة الأوكرانية سفيتلانا زاليشوك خطاباً أمام الأمم المتحدة حول تأثير الحرب بين أوكرانيا وروسيا على النساء، حازت على ثناء واسع النطاق لأدائها ولكن ومن خلال اتخاذ هذا الموقف، أصبحت زاليشوك هدفاً لنوع جديد من التضليل الإعلامي.  … متابعة قراءة مع النّساء ضد التضليل الإعلامي، الأخبار المزيفة والعنف عبر الانترنت

An Initial, Failed Solution For The Event Detection Task

In this post, we are trying to validate our initial solution for the event detection task. If you’re not familiar with the task you can refer to our previous post about “How to” Event Detection in Media using NLP and AI. For applying event detection in news articles we are planning to do the following: Represent each article as a vector of expressive features. Feed the vectorized articles into a sequential clustering model to aggregate the ones talking about the same event. In this post, we’re solving the problem of event detection in the Arabic language. Features Effectiveness In this … متابعة قراءة An Initial, Failed Solution For The Event Detection Task

Initial Genre Classification Experiments

The ability to filter your news feed based on the genre is a critical component of any news aggregator, users would usually want to read sports or political news only not just the most recent or hottest news. In this post, we will explore in great details our initial genre classification system. Let’s start with the.. data In the following experiments, we used an in-house data set. The data set is composed of 190307 HTML document crawled from the following domains [Aljazira, Alarabia, Aljadeed, RT Arabic, BBC arabic]. For each of the documents we tried to extract the following features: … متابعة قراءة Initial Genre Classification Experiments

Intial Experiments on Measuring Informativity of an Arabic Content – Data Collection

In one of our previous articles we suggested a method to build an initial system for informativeness detection, this system should utilize a small set of pairwise comparisons manually annotated and use Snorkel to expand these annotations automatically to a larger training set and then train the model to estimate the article informativeness using this set.In this article, we will go into the details of the implementation of this plan. Data Annotation As noted above Snorkel will need 3 typed of training data: A small manually annotated test set to evaluate the results of the model A smaller manually annotated … متابعة قراءة Intial Experiments on Measuring Informativity of an Arabic Content – Data Collection

Git submodules in the python world Why and How

The basic principle that makes many professional tech companies professional is the simple principle of domain engineering. Basically working for a long period of time on a small set of domains with the hope that you will grow your codebase to be more efficient and successful in developing projects from these domains. the main component in this formula is the idea of code reuse. Sooner or later you will have a certain piece of code that you will use constantly across all your projects, if we are talking about NLP these might be your text normalizers your features extractors or … متابعة قراءة Git submodules in the python world Why and How

Building a Test Collection for Event Detection Systems Evaluation

Before we start, if you’re not familiar with the Event Detection task in NLP you can refer to our previous post on this topic here. So you’ve built a system to detect events in the media… now what? While building a system is a key step, how the system performs on real-world data has equal importance. We need to know whether it actually works and if we can trust its decisions. So.. we need to evaluate our system before putting it in use. Evaluation is a highly important step in the development of any system type as it allows the … متابعة قراءة Building a Test Collection for Event Detection Systems Evaluation

What is Political Bias? – In Technical Terms

In this article, we will review all the researches done in the field of discovering political bias. Understanding Characteristics of Biased Sentences in News Articles Methodology Bias Labeling via Crowd-Sourcing They used crowdsourcing to collect bias labels using “Figure Eight” platform. In crowdsourcing they let the workers make judgements on each target news article (using also the reference news article). Analysis of Perceived News Bias To analyze what kind of words are tagged as bias triggers by the workers: they analyze the phrases annotated as biased in terms of the word length (4 words in a sentence have been annotated). … متابعة قراءة What is Political Bias? – In Technical Terms

News Stream Clustering – Sequential Clustering in Action

In a previous post, we talked about “How to” Event Detection in Media using NLP and AI. In another post, we presented the Sequential Clustering. Today we’re introducing an online (sequential) clustering algorithm specialized in aggregating news articles into fine-grained story clusters. Problem Formulation We focus on the clustering of a stream of documents, where the number of clusters is not fixed and learned automatically. We denote by D (potentially infinite) space of documents. We are interested in associating each document with a cluster via the function C(d) ∈ N, which returns the cluster label given a document. For each … متابعة قراءة News Stream Clustering – Sequential Clustering in Action

How to Version Control Your Machine Learning? – A Look into Data Version Control (DVC)

If you have spent time working with Machine Learning, one thing is clear: it’s an iterative process. Machine learning is about rapid experimentation and iteration, each experiment consists of different parts: the data you use, hyperparameters, learning algorithm, architecture, and the optimal combination of all of those Throughout this iterative process, your accuracy on your dataset will vary accordingly, and without keeping track of your experimenting history you won’t be able to learn much. Versioning lets you keep track of all of your experiments and their different components. How to Version Control ML Projects? One of the most popular ways … متابعة قراءة How to Version Control Your Machine Learning? – A Look into Data Version Control (DVC)

Five Automatic Ways To Build Data For Abstractive Summarization

In the words of the daily show host Trevor Noah there is currently “So much news, so little time”. In fact, the issue of information explosion expands outside the realm of news and covers all the aspects of our life. … متابعة قراءة Five Automatic Ways To Build Data For Abstractive Summarization

أفضل الميزات لخدمة تجميع الأخبار- الجزء الثالث

سوف نتابع في هذا المقال مناقشة أفضل الميزات لخدمة تجميع الأخبار والتي قد سبق وناقشنا أجزاءاً منها في مقالتين سابقتين. الرؤية 1000x تدقيق الحقائق اكتشاف والقضاء على الأخبار المزيفة. قارئ الأخبار عندما تكون مشغولاً جداً بقراءة المقالة بالكامل، ما عليك سوى فتح التطبيق أو الموقع وتصفح العناوين الرئيسية وتحديد القصص الجذابة والاستماع إلى المحتوى! خيار آخر يمكن أن يكون موجوداً وهو الاستماع إلى موجز الأخبار. نظام التنبيه بمجرد أن ندرك أن حدثاً عاجلاً قد حدث في مكان معين، يمكننا تنبيه المستخدمين في المنطقة المحيطة. الجدول الزمني للأخبار الحصول على ملخص للأحداث في فترة معينة بسهولة عن طريق جدول زمني مرئي … متابعة قراءة أفضل الميزات لخدمة تجميع الأخبار- الجزء الثالث

أفضل الميزات لخدمة تجميع الأخبار-الجزء الثاني

سوف نتابع في هذا المقال مناقشة أفضل الميزات لخدمة تجميع الأخبار والتي قد سبق وناقشنا جزءاً منها في المقالة السابقة. رؤية 5X تخصيص الأخبار تمكين المستخدم من تخصيص شريط الأخبار الخاص به عن طريق اختيار ما يريد متابعته من حيث: اختيار نوع الخبر: قدرة المستخدم على الحصول على الأخبار من الأنواع التي يهتم بها فقط. اختيار مصادر الخبر: قدرة المستخدم  على متابعة وسائل الإعلام التي يحبها. اختيار المؤلف / المستخدم: اتباع مؤلف أو مستخدم معين، أو تصفح المجموعات التي أنشأها مستخدمون آخرون اختيار العلامات (Tags): استخراج العلامات / الكلمات الرئيسية من المقالات. خدمة البحث تمكين المستخدم من استرجاع محتوى الأخبار … متابعة قراءة أفضل الميزات لخدمة تجميع الأخبار-الجزء الثاني

أفضل الميزات لخدمة تجميع الأخبار-الجزء الأول

كمية المعلومات الإخبارية التي يمكن للشخص الوصول إليها في هذه الأيام لم يكن من الممكن تصورها قبل مائة عام.  بالمقابل لا يزال لدينا 24 ساعة فقط في اليوم، ولهذا يجب طرح السؤال التالي: كيف نحصل على أكبر قدر ممكن من الأخبار القيمة في وقت محدود؟ الجواب هو خدمة تجميع الأخبار التي تغير طريقة اكتشاف الأخبار واستهلاكها. خدمة تجميع الأخبار باستخدام الإنترنت، هي مواقع تسمح لك بمشاهدة المحتوى من الصحف، ووسائل الإعلام، والمدونات، وما إلى ذلك في مكان واحد. دعونا نتصور كيف سيكون مجمع الأخبار الأفضل على هذا الكوكب. منظور B2C نحن نحاول هنا تصور تجربة مستخدم نهائي تتفوق على ماتقدمه … متابعة قراءة أفضل الميزات لخدمة تجميع الأخبار-الجزء الأول

Available Visualization Libraries to Handle Stream of Data

Google Analytics Google Analytics generates detailed statistics and fresh insights into your website’s traffic and traffic sources. With Google Analytics users can track visitors from all referrers, including search engines and social networks, direct visits and referring sites. It also tracks and monitors display advertising, PPC networks, email marketing and other digital collateral. You can not only measure sales and conversions, but also gain fresh insights into how visitors use your site and how you can keep them coming back. Segment From startups to the Fortune 500, thousands of companies use Segment as their customer data hub. We believe that … متابعة قراءة Available Visualization Libraries to Handle Stream of Data

An Implementation of a News Stream Sequence Clustering Algorithm

In a previous post, we discussed a News Stream Sequential Clustering Algorithm. In this post, we’re discussing the details of implementing this algorithm with minimal tuning, and showing the results produced by this implementation. Along with this post, we’re evaluating … متابعة قراءة An Implementation of a News Stream Sequence Clustering Algorithm

Contrary view detection based on VODUM

While reading the news each one of us perceives it in a different manner. We have our own biases and we tend to search for information that confirms our previous beliefs. Thus different people might have drastically different viewpoints of … متابعة قراءة Contrary view detection based on VODUM

AWS Batch Jobs — An Overview

AWS Batch enables you to run batch computing workloads on the AWS Cloud. This service can automatically provision compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. Related Definitions Jobs: A unit of work (such as a shell script, a Linux executable, or a Docker container image) that you submit to AWS Batch. It runs as a containerized application on an Amazon EC2 instance in your computing environment, using parameters that you specify in a job definition. Container images are stored in and pulled from container registries. Job Definitions: specifies how jobs are … متابعة قراءة AWS Batch Jobs — An Overview

AWS Lambda and SQS Payload Limitation

In this post, we’re talking about deploying an AI service that processes the political news articles in Almeta’s database. The service is assumed to be deployed as AWS Lambda function, with the use of AWS SQS to maintain the incoming requests while the function is throttled. An important limit to be considered in such a situation, is the limit that is put on the payload size by both AWS Lambda and AWS SQS. Along this post, we investigate what is this limit for each of the two services and how are we affected by it. AWS SQS There are two … متابعة قراءة AWS Lambda and SQS Payload Limitation

Can AI Guess How Many Likes Your Facebook Post Will Get Before You Post It?

The rise of social media has undoubtedly changed the way marketing work, not only does it allow the companies to easily access massive numbers of potential customers but it also allow them to segment the market, figure out what is … متابعة قراءة Can AI Guess How Many Likes Your Facebook Post Will Get Before You Post It?

Almeta App — Caricature Tab

Caricature Tab is a new incoming feature planned to be a part of the Almeta News app soon. In its primary version, the feature will provide the user with a stack of in-house designed Caricature images to enjoy browsing. If you’re curious about how we in Almeta manage to handle such new features, then you will discover this in this post. In this post, we’re showing the entire process towards making decisions to answer a bunch of design and deployment related questions: Where to store the images? How to handle new ones? Do we need caching? What to store in … متابعة قراءة Almeta App — Caricature Tab

لماذا لا تغير الحقائق ما نفكر به؟ التحيز للتأكيد

لماذا لا تغير الحقائق ما نفكر به؟

أجرى الباحثون في جامعة ستانفورد تجربة: جمع الباحثون مجموعة من الطلاب الذين لديهم آراء متعارضة حول عقوبة الإعدام.  نصف الطلاب كانوا مؤيدين لذلك واعتقدوا أنها تردع الجريمة، وكان النصف الآخر ضدها ويعتقد بأنه ليس لديها أي تأثير على نسب الجريمة … متابعة قراءة لماذا لا تغير الحقائق ما نفكر به؟

From Sentiment to Political Bias in the Arab World and the Arabic Content

From Sentiment to Political Bias in the Arab World and the Arabic Content

The rise of political bias problem across several news anchors presents a real threat to free and independent journalism and a major factor in shifting the populace conception of the world. Several NGO’s, research centres and private organizations are working … متابعة قراءة From Sentiment to Political Bias in the Arab World and the Arabic Content

How to Rank Articles Based on How Informative They Are

How to Rank Articles Based on How Informative They Are – Using Snorkel

Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. Yes, it is much simpler to flag an article as spammy … متابعة قراءة How to Rank Articles Based on How Informative They Are – Using Snorkel

Informativity Detection, Our Research Gist

Informativity Detection – Almeta’s Research Gist

Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. The question of measuring How informative a piece of news is … متابعة قراءة Informativity Detection – Almeta’s Research Gist

What Makes an Article Informative and How Computers Can Measure Informativeness

What Makes an Article Informative – And How Computers Can Measure Informativity of a Text Content

The Concept of an informative text is really abstract and it is hard to come up with a definitive formula to measure it, in this article we will explore some of the features that we believe can make an article … متابعة قراءة What Makes an Article Informative – And How Computers Can Measure Informativity of a Text Content

Aspect Detection and Named Entity Linking (NEL): Using SPARQL and DBpedia

In our effort to provide the best news feed out there, one of the goals we are trying to achieve here at Almeta is to capture the interaction between different news outlets and how the coverage of the same event … متابعة قراءة Aspect Detection and Named Entity Linking (NEL): Using SPARQL and DBpedia

An Overview of The Event Extraction Task in NLP

An Overview of The Event Extraction Task in NLP

Events possess a rich structure that is important for intelligent information access systems (information retrieval, question answering, summarization, etc.). Without information about what happened, where, and to whom, temporal information about an event may not be very useful. In light … متابعة قراءة An Overview of The Event Extraction Task in NLP

Analysis of the Readability Metric Results in Almeta News Feed

In this post, we’re analyzing the results returned by the readability metric in our news feed. If you haven’t checked our post about “How to measure the readability of a text?” before, you can read about it here. How Are We Measuring the Readability? The main part of analyzing a metric is to know how does it work. In the current version, we’re depending on the AARIBase metric for measuring the readability. So, let’s have a look first on how does AARIBase work. Here’s the AARIBase formula: AARIBase = (3.28 × NOC) + (1.43 × ACW) + (1.24 × AWS) … متابعة قراءة Analysis of the Readability Metric Results in Almeta News Feed

Clickbait Detection Using Word2Vec Representation

In a previous article, How to Detect Clickbait Headlines using NLP? We introduced the task of clickbait detection and explored how it can be modeled within the domain of machine learning and NLP. If you are not familiar with the concept of clickbait detection, make sure to review it before continuing. In this post, we’re building a classifier for clickbait detection in the news headlines depending on a pre-trained Arabic Word2Vec model and we’re validating this solution. If you are not familiar with the Word2Vec concept you can refer to this Wikipedia article for more information. News Headlines Representation In … متابعة قراءة Clickbait Detection Using Word2Vec Representation

Google’s AutoML Overview

In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools. Before start if you are not familiar with the term AutoML you can refer to our previous post on this topic. Who is Google AutoML for? and When to Use It? The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning. The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API … متابعة قراءة Google’s AutoML Overview

Microservices Authentication

Microservices & B2B Authentication – With AWS and Serverless (sls)

In the process of designing a good system, you must have heard of the term Microservices, where, in short, each microservice is responsible for a specific task or a group of heavily related tasks, and they communicate with each other … متابعة قراءة Microservices & B2B Authentication – With AWS and Serverless (sls)

ماهي تقنيات معالجة اللغات الطبيعية

ما هي تقنيات معالجة اللغات الطبيعية

قد لا يكون لديك الاطلاع الكافي على معالجة اللغات الطبيعية لكنك بالطبع تعرف كل من سيري أو أليكسا! “لم أفهم ما قلته للتو.” هذا ما يمكن أن تجيبك به سيري أو أليكسا مراراً وتكراراً. متى كانت آخر مرة طلبت فيها … متابعة قراءة ما هي تقنيات معالجة اللغات الطبيعية

كيف نحدد درجة إفادة وغنى النص

كيف نحدد درجة إفادة وغنى النص

قبل أن نبدأ بالحديث عن درجة إفادة وغنى النص دعونا نبدأ بسؤال بسيط، ما الذي يشكل مقالة مفيدة؟ صفة الافادة هي توفير معلومات مفيدة أو مثيرة للاهتمام. ومع ذلك، هذا لا يزال المفهوم مجرداً. مسألة تحديد درجة إفادة وغنى النص … متابعة قراءة كيف نحدد درجة إفادة وغنى النص