Canary deployment is a pattern that rolls out releases to a subset of users or servers. It deploys the changes to a small set of servers, which allows you to test and monitor how the new release works before rolling the changes to the rest of the servers.

Virtual machine scale sets (VMSS) are an Azure compute resource that you can use to deploy and manage a set of identical VMs. With all VMs configured the same, scale sets are designed to support true autoscale, and no pre-provisioning of VMs is required. So it’s easier to build large-scale services that target big compute, large data, and containerized workloads.

VMSS allow you to manage large number of identical VMs with simple instructions, yet allow you to update specific VMs. You can build your VMSS with a customized image or publicly available OS images along with VM extension scripts to setup all required environments. When it comes to update existing VMSS, you need to update its configuration with a new image or extension scripts, and then manually trigger the update of the VMSS instances, either all in one instruction, or selectively pick some VMs to be updated.

The ability to update individual VMs in VMSS allows us to control the number of VMs that will be updated to the new releases, i.e., allows us to do canary deployment:

  1. (Existing) Create the initial VMSS which host your services.
  2. Update the VMSS configuration, either point to new customized image, or update the extension scripts, which contains the new release of your services.
  3. Selectively update individual instances to the new release according to the configuration changes.
  4. Verify the new release works.
  5. Update the rest of instances to the new release.

Nginx Canary Deployment Example

Here we demonstrate the canary deployment for VMSS using the Nginx binary release.

Prepare the VMSS

We use the public Ubuntu Server 16.04 LTS together with an extension script which installs the Nginx service to setup the VMSS. In your project, you can customize the extension script to install the service on demand, or create a customized image (reference: Packer / Azure Resource Manager Builder).

Prepare variable configurations

First we setup some variables that will be used in the following preparation steps. You can update the variables based on your needs.

Create VMSS from public Ubuntu image

We create a VMSS with 3 instances, using the public image “UbuntuLTS”.

Prepare the init scripts

In order to use the custom script extension to configure the VMSS, we need to store the script at some location that’s accessible via HTTP(s). Here we create a storage account for the script storage, and expose the scripts publicly to allow the script extension to pick it up.
The custom script is fairly simple in this case. It installs the Nginx package from the Ubuntu Apt source. In your project, you may update the script to fetch dependencies, install, configure and start services, etc.

Install the custom script extension

Update load balancer endpoint

We need to create a load balancer rule to route the public traffic to the Nginx services running in the VMSS backend.

Verify everything works

Check that we can access the Nginx service from the public endpoint of the load balancer.

Deploy New Release in Canary Deployment Pattern

In the new release, we make a simple update in the Nginx landing page, and deploy it to 1 instance in the early stage. So after the deployment, we should have 1 instance serving the updated landing page, and 2 instances serving the original page.

First, we need to update and upload the new custom script. Some points to call out here:

  • The custom script will be executed on a fresh VM after it is created from the given OS image. It is not an incremental update process based on the existing VM. So we need to install all the dependencies and services again, with the changes in the new release included.
  • Any updates you make to your application are not exposed to the Custom Script Extension unless that install script changes. To force VMSS to pick up the custom script changes, you need to change the script name so that it results in a different file URI.
  • This will not affect the existing instances until we manually update those instances.

Now that the custom script configuration is update for the VMSS, we can update 1 instance to pick up the new custom script.

Check the load balancer public endpoint and we should see the old version and new version interleaved.

Now you can do more checks to verify if the new version works as expected. Note that the VMSS sits behind the frontend load balancer, you do not know the service status for a individual node through the public access point. If you need to check a specific instance, you can SSH login to the that instance through the NAT mapping defined in the load balancer and check the service in the SSH session; or you can open an tunnel to the remote service through the SSH channel, and check the service in detail through the local port.

After this, you can visit the web page through http://localhost:8080 and it will show you the page served by the updated instance.

Complete the Release of the New Version

At this point you have 1 instance serving the updated web page and 2 instances serving the original page in the VMSS. When you have verified that the new version works, you can update the rest of the instances to the new version.

Note that this will update all the instances with models not aligned with the latest state in parallel. So all the outdated instances will be brought down, updated, and brought up again. It will not cause service downtime as long as the load balancer noticed some of the backends are down, as we have at least 1 instance updated in the previous steps. However, during the update window of the outdated instances, all the client traffic will be routed to up-to-date instances, which will increase the load and latency on those instances.

A better approach may be querying the outdated instances list first, and then update them with smaller granularity:

In this way, only a small number of instances are being updated at a given point of time. The rest of the instances are not touched and will serve the traffic as per normal.

Work with Image Based Canary Deployment

The above steps demonstrates how we can do canary deployments for VMSS using the custom script extension. VMSS also supports custom images. If specified, all the VMs will be created from the given image. Compared to the custom script extension based VMSS, the image based VMSS:

  • Provisions faster: the service creation and configuration is done at the image creation process, and when VMSS needs to provision new instance, it creates the VM from that image. It doesn’t need to execute extra scripts after the VM provision. (Although you can still add a custom script extension if needed.)
  • Service and dependency versions are more stable. The service and dependencies are fetched when the image is created, and all the VMs created from the image get the same binaries. If working with custom script extension, you need to be careful if the service or dependencies is upgraded when the VMSS is scaling.

Packer is widely used to create OS images in different cloud platforms. Consider if we need to transform the above custom script extension based deployments to image based, we can create the base image using the following packer configuration (filename: packer-nginx.json):

and build it with:

When this completes, we will get an image nginx-base-image in the resource group specified. Similarly, we can create the updated image (nginx-updated-image) by adding the following line to the provisioners script, updating the managed_image_name to nginx-updated-image and build the image.

After that we can get the VMSS image ID for the base image and the updated one:

Now that we have two images, we can do the canary deployment as follows:

  1. Initially, we need to specify the base image ID when we create the VMSS:
  2. When we deploy new release, we update the image in VMSS configuration:
  3. Now we can selectively update certain instance to using the latest image with command az vmss update-instances, or upgrade all instances with --instance-ids setting to *.

Canary Deployment with Jenkins

In canary deployment we may roll out new releases to the servers gradually, which may involve multiple deployments that updates the old releases / new releases server ratio. This may not be suitable to automate in limited number of Jenkins jobs.

However, if we simplify the process, and we can model the process with parameterized Jenkins jobs. We have published Azure Virtual Machine Scale Set Jenkins plugin which helps to deploy new images to VMSS.

The above image based canary deployment can be modeled as two Jenkins Pipeline jobs:

  • Deploy to a subset of instances

  • Upgrade all the rest instances to the latest image

As mentioned in the previous example, you need to implement extra logic to test and validate if the new image is working properly.

Further Reading: Blue-green Deployment

We can also do blue-green deployment on VMSS, in which you have two nearly identical backends, you can upgrade one of them and switch the routing to the upgraded backend without interrupting the user traffic. We have prepared a quick start template and you can find more details at Jenkins Blue-green Deployment to VMSS.