Amazon SageMaker — An Overview

As a part of our series about exploring machine learning cloud services, in this post, we’re exploring Amazon SageMaker provided by Amazon.

Amazon SageMaker Notebook Instances

It is a fully managed ML compute instance running the Jupyter Notebook App. Amazon SageMaker manages creating the instance and related resources.
We can use it for:

  • Write scripts to explore, analyze, and process data
  • Write code to create model training jobs
  • Deploy models to Amazon SageMaker hosting
  • Test or validate your models

Amazon SageMaker Processing

Enables running jobs to preprocess and post-process data, perform feature engineering, and evaluate models on Amazon SageMaker easily and at scale. You can simply use notebook instances to run these jobs.

You have the flexibility to use the built-in data processing containers or to bring your own containers and submit custom jobs to run on managed infrastructure.

An example notebook for data preprocessing and model evaluation using the built-in containers and a custom container can be found here.

Amazon SageMaker Model Training

To train a model you need to rent compute resources i.e. ML compute instances managed by Amazon SageMaker, to run your training process on.

Amazon SageMaker Workflow

Training a model in Amazon SageMaker means you’re forced to use their storage service Amazon S3, for storing your dataset and storing the output of the training process. Thus, you’re charged for the storage service in addition to the compute resources related to the training process.

You have the following options for a training algorithm:

After training a model, SageMaker saves the resulting model artifacts and other output in the S3 bucket should’ve been specified for that purpose. To export the model and load it in memory for offline predictions consider the answers to this question.

Amazon SageMake Model Deployment

After you train your model, you can deploy it to get predictions in one of two ways:

  • To set up a persistent endpoint to get one prediction at a time, use Amazon SageMaker hosting services.
  • To get predictions for an entire dataset, use the Amazon SageMaker batch transform.

Amazon SageMaker Hosting Services

When deploying models using SageMaker consider the following:

  • You can deploy multiple variants of a model to the same Amazon SageMaker HTTPS endpoint. This is useful for testing variations of a model in production. For example, suppose that you’ve deployed a model into production. You want to test a variation of the model by directing a small amount of traffic, say 5%, to the new model. To do this, create an endpoint configuration that describes both variants of the model.
  • When hosting models in production, you can configure the endpoint to elastically scale the deployed ML compute instances. For each production variant, you specify the number of ML compute instances that you want to deploy. When you specify two or more instances, Amazon SageMaker launches them in multiple Availability Zones. This ensures continuous availability.

Deploying a model using Amazon SageMaker hosting services is a three-step process:

  1. Create a model in Amazon SageMaker: means telling Amazon SageMaker where it can find the model components.
  2. Create an endpoint configuration for an HTTPS endpoint: You specify the name of one or more models in production variants and the ML compute instances that you want Amazon SageMaker to launch to host each production variant.
  3. Create an HTTPS endpoint: Provide the endpoint configuration to Amazon SageMaker. The service launches the ML compute instances and deploys the model or models as specified in the configuration.

Two notes related to this deployment:

First, this kind of deployment is server-based. Alternatively, it may be suitable for you to go completely serverless. Going serverless will involve packaging up the contents of your SageMaker endpoint inside of a Lambda function (which we don’t know if it’s really possible, we couldn’t find any useful resources for this.)

Second, in this deployment, your model is deployed as a separate service i.e. wrapped in service that is deployed independently of the code that should transform your input data __when doing inference__ to a format your model is expecting.
A simple suggested serverless architecture to call your model would be like this:

Starting from the client-side, a client script calls an Amazon API Gateway API action and passes parameter values. API Gateway is a layer that provides API to the client. In addition, it seals the backend so that AWS Lambda stays and executes in a protected private network. API Gateway passes the parameter values to the Lambda function. The Lambda function parses the value and sends it to the SageMaker model endpoint. The model performs the prediction and returns the predicted value to AWS Lambda. The Lambda function parses the returned value and sends it back to API Gateway. API Gateway responds to the client with that value.

Finally, it’s possible to deploy a model trained outside Amazon SageMaker using Amazon SageMaker. However, this is not straightforward, the main constraint is that you should export the model into a format that can be consumed by Amazon SageMaker. For more information about this, you can refer to this blog post.

Amazon SageMaker Batch Transform

The workflow of a batch transform job

To get inferences for an entire dataset, use batch transform. With batch transform, you create a batch transform job using a trained model and the dataset, which must be stored in Amazon S3. Amazon SageMaker saves the inferences in an S3 bucket that you specify when you create the batch transform job. Batch transform manages all of the compute resources required to get inferences. This includes launching instances and deleting them after the batch transform job has completed.

Use batch transform when you:

  • Want to get inferences for an entire dataset and index them to serve inferences in real-time.
  • Don’t need a persistent endpoint that applications (for example, web or mobile apps) can call to get inferences.
  • Don’t need the subsecond latency that Amazon SageMaker hosted endpoints provide.


On Amazon SageMaker you have to pay mainly for:

  • Building your model using on-demand ML notebook instances. Here the price defers according to the compute instance you choose and its region.
  • For processing jobs you run. The price depends on the compute instance you choose to run your jobs on.
  • Training your model on on-demand ML training instances. The price depends on the compute instance you choose to train on.
  • Deploying your model on on-demand hosting instances. The price depends on the compute instance you choose to host your model on.

For the pricing details, you can refer to this page.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.

Further Reading

  1. Amazon SageMaker Documentation
  2. Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda
  3. Bring your own pre-trained MXNet or TensorFlow models into Amazon SageMaker

Leave a Reply

Your email address will not be published. Required fields are marked *