Azure Machine Learning

Here at Catapult, we are getting increasingly excited about the additional features coming to v2 of Azure Machine Learning (currently in public preview). One of the most game-changing features allows us to deploy ML models to infrastructure for real-time inference that is managed by Azure ML itself, without the need to maintain and manage an Azure Kubernetes Service (AKS) Cluster. Any workspace contributor can build these endpoints using v2 of the Azure ML Command Line Interface (CLI).

Architecture of an Endpoint

Azure ML Endpoint Architecture

Architecture for AML v2 managed endpoints (image courtesy of Microsoft)

There are two key entities that users should be aware of within a real time endpoint:

1. The Endpoint – there is precisely one endpoint, which is the first entity to define. It consists of the endpoint URI and the expected swagger schema.

2. The Deployments – there can be many deployments under a single endpoint, each corresponding to a different version of the model. Traffic can be varied between the deployments over time to accomplish blue-green or canary deployment strategies.

Important note: the virtual machine sizes are defined at the deployment level. This means that you need at least one virtual compute node per deployment within the endpoint.


Azure ML v2.0 is available only through Command Line Interface (CLI). Therefore, the following steps consist of bash commands to be executed in a Linux CLI. Configuring a new virtual environment beforehand is recommended. We recommend installing the latest version of the Azure ML SDK before setting up your first endpoints – you should look to have at least v1.37.0 installed:

Next, you need to install the v2 Azure ML CLI extension for the Azure CLI. This involves installing or updating the existing CLI, and then installing the new ML extension.

Finally, log in to the CLI:

Creating the Endpoint

The endpoint creation process is configuration-driven. This means the first step is to create an endpoint.yaml configuration file to describe the endpoint. An example configuration file is shown below:

Once you have created this file, you can deploy an endpoint into your workspace through the command line as shown below:

This will create the URI, which will be visible in the “Endpoints” tab of your Azure ML workspace. In this example, key-based authentication means that a permanent API key will be created, which must be passed in the header for authentication.

Creating a Deployment

Now that an endpoint has been created, you can deploy a model within it. You can deploy many versions of the same model behind an endpoint, each of which will be a separate deployment. You can then vary the traffic that passes to each deployment over time.

To create a deployment, you need to predefine the following:

· A registered model in Azure ML

· A registered environment in Azure ML, containing the required python packages

· A file, in line with the usual Azure ML format for real-time endpoints

As with endpoints, deployments are configuration-driven, so you will need a deployment.yaml file for each deployment. For example:

Note: when the deployment is initially created, it will deploy with a fixed number of nodes, with VMs defined in the instance_type. We will cover how to enable autoscaling in a future post.

You can then create the deployment through the CLI:

When this initial deployment is completed, by default no traffic is routed to the endpoint. At this point, you may want to test the endpoint and deployment by passing it an example .json file:

The endpoint is not yet live, so finally set the endpoint live by making 100% of the traffic pass through the deployment:

This update command can be used to alter the traffic if there are multiple deployments.