As a part of our series about exploring machine learning cloud services, in this post, we’re exploring Google AI Platform provided by Google.
AI Platform provides the dependencies required to train machine learning models using these hosted frameworks in its runtime versions:
Additionally, you can use custom containers to run training jobs with other machine learning frameworks.
How Training Works
- You create a Python application that trains your model, and you build it as you would run locally in your development environment.
- You get your training and verification data into a source that the AI Platform can access. This usually means putting it in Cloud Storage, Cloud Bigtable, or another Google Cloud storage service.
- Package your application and transfer it to a Cloud Storage bucket that your project can access.
- The AI Platform training service sets up resources for your job. It allocates one or more training instances based on your job configuration.
From 2 & 3 you’re forced to charge for Google Cloud storage services, in addition to the computing resources consumed by your training job.
Distributed Training Strategies
There are three basic strategies to train a model with multiple nodes:
- Data-parallel training with synchronous updates.
- Data-parallel training with asynchronous updates.
- Model-parallel training.
In data-parallel training, the whole model is shared with all worker nodes. Each node calculates gradient vectors independently from some part of the training dataset in the same manner as the mini-batch processing. The calculated gradient vectors are collected into the parameter server node, and model parameters are updated with the total summation of the gradient vectors unless you’re using asynchronous updates where the parameter server applies each gradient vector independently, right after receiving it from one of the worker nodes.
You can use the data-parallel strategy regardless of the model structure.
While you can use model-parallel training just with deep TensorFlow models to implement distribution strategies for your computation graph over hardware/processors.
AI Platform Hyperparameter tuning optimizes a target variable that you specify. If you want to use hyperparameter tuning, you must include configuration details when you create your training job.
You can host your trained machine learning models in the cloud and use AI Platform Prediction to infer target values for new data.
The prediction service manages the infrastructure needed to run your model at scale and makes it available for online and batch prediction requests.
You can deploy with AI Platform either models trained inside the AI Platform or outside it. In both cases, you should first export the model as one or more artifacts that can be deployed to AI Platform Prediction.
If you export a Scikit-Learn pipeline or a custom prediction routine, you can include custom code to run at prediction time, beyond just the prediction routine that your machine learning framework provides. You can use this to preprocess prediction input, post-process prediction results, or add custom logging.
AI Platform provides two ways to get predictions from trained models: online prediction, and batch prediction.
AI Platform serves predictions from your model by running a number of virtual machines (“nodes”).
Both online and batch prediction run your node with distributed processing, so a given request or job can use multiple nodes simultaneously.
Online and batch prediction allocate nodes differently, which can have a substantial effect on what you will be charged:
- The batch prediction service scales the number of nodes it uses, to minimize the amount of elapsed time your job takes.
- The online prediction service scales the number of nodes it uses, to maximize the number of requests it can handle without introducing too much latency.
Things to Consider when using Online Prediction:
After the service has scaled down to zero, or when there is a sudden spike in traffic, it can take time (seconds to minutes) to initialize nodes to serve requests. The initialization time depends on your model version size, so a client-side timeout may result in dropping requests until the new nodes have been initialized, and/or increased latencies during this period of time.
To ensure prompt serving at all times, you can specify a minimum number of nodes that the service should keep ready.
On Google AI Platform you have to pay for:
Google uses the concept of training units to calculate the training price. Training units measure the resource usage of your job; the price per hour of a machine configuration is the number of training units it uses multiplied by the region’s cost of training.
The cost defers according to the machine type you choose to train your model on.
Google uses the concept of node hour to calculate the deployment price. A node hour represents the time a virtual machine spends running your prediction job or waiting in a ready state to handle prediction requests.
The cost defers according to the machine type you choose to deploy your model on.
For detailed information about the pricing and the machine types, you can refer to this page.
You can also use their pricing calculator to estimate your training and deployment costs.
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.