In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools.
Before
Who is Google AutoML for? and When to Use It?
The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning.
The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API can’t satisfy his needs, even if he doesn’t have enough knowledge in machine learning.
In general, anyone can use these services to build a custom AI model on the fly.
What Kinds of AutoML Services Does Google Provide?
Let’s see what kinds of AutoML services are provided by Google in the NLP field, and whether they can be adapted to process the Arabic language.
Cloud AutoML Natural Language Classification
Enables you to create custom machine learning models to classify content into a custom set of categories.
According to Google’s documentation, the current service supports content classification in English language text. We can train a custom model to classify text in other languages including Arabic, but the model quality may vary.
How to train the model?
Build your own dataset, and upload it
Cloud AutoML Natural Language Entity Extraction
Enables you to create custom machine learning models to identify a custom set of entities.
According to Google’s documentation, this service currently supports entity analysis in English language text. We can train a custom model using text in other languages, but the model performance is undetermined.
How to train the model?
Annotate a dataset, then upload it in JSON format. The annotation can be done before uploading the data or after using Google’s AutoML UI, or the user can request annotation from Google’s labeling service. The trained model is automatically deployed, and we can get its predictions using an API.
Cloud AutoML Natural Language Sentiment Analysis
Enables you to create custom machine learning models to analyze attitudes.
According to Google’s documentation, the current service supports sentiment analysis in English language text. We can train a custom model to classify text in other languages including Arabic, but t
The sentiment score is an integer ranging from 0 (relatively negative) to a maximum value of your choice (positive). So, if we want to identify whether the sentiment is negative, positive, or neutral, we would label the training data with sentiment scores of 0 (negative), 1 (neutral), and 2 (positive).
How to train the model?
Build your own dataset, then upload it as .csv file. The trained model is automatically deployed, and we can get its predictions using an API.
Cloud AutoML Natural Language Pricing
Training Cost: The cost of training a model is $3.00 per hour.
Prediction Cost: The usage of AutoML Natural Language is calculated monthly in terms of how many text records were sent for analysis during the billing month, as follows:
If a document contains more than 1,000 characters, it counts as one text record for every 1,000 characters.
Feature | 0 – 30K | 30K+ – 5M+ |
---|---|---|
AutoML Natural Language Content Classification | Free | $5.00 |
AutoML Natural Language Sentiment Analysis | Free | $5.00 |
AutoML Natural Language Entity Extraction | Free | $5.00 |
Cloud AutoML Translation
Enables you to create custom translation models so that translation queries return results specific to a defined domain.
The supported language pairs can be found here which include Arabic to English (and vice versa) translation
How to train the model?
AutoML Translation trains custom models using matching pairs of sentences in the source and target languages.
The sentence pairs used to train the custom model must be in Tab-separated values (.tsv) or Translation Memory eXchange (.tmx) format. A multiple .tsv and .tmx files can be batched into a comma-separated values (.csv) file. AutoML Translation uses the sentence pairs you provide to train,
The trained model is automatically deployed, and we can get its predictions using an API.
Cloud AutoML Translation Pricing
Training Cost: The cost for training a model is $76.00 per hour, If training fails for any reason other than a user-initiated cancelation, you will not be billed for the time.
Translation Cost: Your usage of AutoML Translation is calculated in terms of how many characters you send for translation with an AutoML custom model.
0 – .5 million characters | .5 – 5 million characters | |
---|---|---|
Translation | Free | $80 per 1,000,000 characters* |
Price is per character sent for processing, including whitespace characters. Empty queries are charged for one character.
Cloud AutoML Tables
Enables you to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale. Here are its features and capabilities:
Data support
Helps in creating clean, effective training data by providing information about missing data, correlation, cardinality, and distribution for each of your features.
Feature engineering
Automatically performs common feature engineering tasks, including:
- Normalize and bucketize numeric features.
- Create one-hot encoding and embeddings for categorical features.
- Perform basic processing for text features.
- Extract
date – and time-related features from Timestamp columns.
Model training
Training for multiple model architectures at the same time. The model architectures AutoML Tables tests include:
- Linear
- Feedforward deep neural network
- Gradient Boosted Decision Tree
AdaNet - Ensembles of various model architectures
Model evaluation and final model creation
Using a validation set, determine the best model architecture for the data. After that two kinds of models are trained:
- A model trained with the training and validation sets. this model is used to predict the test set targets to provide the evaluation of this model.
- A model trained with the training, validation, and test sets. This is the model that is provided to be used to make predictions.
Supported Problem Types
- Regression problems
- Classification problems
Cloud AutoML Tables Pricing
Prices for the usage of AutoML Tables are computed based on the underlying GCP resources required for model training, model deployment, batch prediction, and online prediction. You don’t incur charges from AutoML Tables until you start training your model.
Model training costs: Model training costs $19.32 per hour of compute resources used to train the model.
Model deployment costs: Model deployment costs $0.005 per GiB per hour per machine that a model is deployed. They currently replicate the model to memory in 9 machines for low latency serving purposes, so there is a 9x multiplier applied to this cost.
Batch prediction costs: Batch prediction using the model costs $1.16 per hour of
Online prediction costs: Online predictions using the model cost $0.21 per hour of compute resources used.
Conclusion
In this post, we talked about the services provided by Google’s AutoML, including Cloud AutoML Natural Language, Cloud AutoML Translation, and Cloud AutoML Tables, their fitness for processing Arabic texts, and their pricing.
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.
Hi,
Thanks for the great article !
It is what I was looking for.
However, I wonder if I want to periodically improve the model (like adding 30 invoices every 2 weeks). Could I build on the model and simply add a new dataset and go through the manual training again without having to re-do all the previous work on the old dataset with the 150 documents ?
Maybe is there a way to extract a file with the training work on the previous 150 documents ? And then on the new set ? And combine both files information together ?
Thanks !