Progressive Delivery: How To Implement Flagger with Istio

What will be discussed:

  • Progressive Delivery
  • What is the added value of Flagger
  • Flagger’s Deployment Strategies
  • Canary Release Demo: how to implement Flagger with Istio

Versions:

  • Istio: v1.12.*
  • Flagger: v1.16.0

Progressive Delivery

Progressive delivery is a modern software development for gradually rolling out new features in order to limit the potential negative impact for new product features.

Flagger helps us to manage traffic routing between our current release and our new release. Flagger uses a service mesh (App Mesh, Istio, Linkerd, Open Service Mesh) or an ingress controller (Contour, Gloo, NGINX, Skipper, Traefik) for traffic routing.

To sum up:

Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.
https://docs.flagger.app/

What is the added value of Flagger

Flagger will monitor the traffic routing for the canary release, and “decided” whether to route more traffic to the canary release or not. Not just that, we have very good control over how much traffic to route each step . For example we can configure Flagger to query all the “response code” that is routed to the canary release service, and route 5% of the traffic every 30 minutes to the new release. Flagger will test the traffic each step, as long as the test passed it will keep routing more traffic as configured in the Canary Release.

Example: If more than 1% of the “response code” that has been routed to the canary release was from the 5XX family, we can set Flagger to halt the advance of the traffic to the new release. Otherwise Flagger will keep routing more traffic to the canary release, as configured in the canary.

Flagger can query Prometheus, Datadog, New Relic, CloudWatch or Graphite and sends alerts via: Slack, MS Teams, Discord and Rocket.

Flagger’s Deployment Strategies:

  • Canary Release (progressive traffic shifting): Istio, Linkerd, App Mesh, NGINX, Skipper, Contour, Gloo Edge, Traefik, Open Service Mesh
  • A/B Testing (HTTP headers and cookies traffic routing): Istio, App Mesh, NGINX, Contour, Gloo Edge
  • Blue/Green (traffic switching): Kubernetes CNI, Istio, Linkerd, App Mesh, NGINX, Contour, Gloo Edge, Open Service Mesh
  • Blue/Green Mirroring (traffic shadowing): Istio

Demo Canary Release: How To Implement Flagger with Istio

STEP 1: Add Flagger Helm repository

$ helm repo add flagger https://flagger.app

STEP 2: Deploy Flagger’s Canary CRD

$ kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
customresourcedefinition.apiextensions.k8s.io "canaries.flagger.app" created
customresourcedefinition.apiextensions.k8s.io "metrictemplates.flagger.app" created
customresourcedefinition.apiextensions.k8s.io "alertproviders.flagger.app" created

STEP 3: Deploy Istio 1.12.0

https://istio.io/latest/docs/setup/additional-setup/gateway/
As a security best practice, it is recommended to deploy the gateway in a different namespace from the control plane.
$ helm repo add istio https://istio-release.storage.googleapis.com/charts
$ helm repo update
$ kubectl create namespace istio-system
$ helm install istio-base istio/base -n istio-system
$ helm install istiod istio/istiod -n istio-system --wait
$ kubectl create namespace istio-ingress
$ kubectl label namespace istio-ingress istio-injection=enabled
$ helm install istio-ingress istio/gateway -n istio-ingress --wait

STEP 4: Deploy Istio Prometheus add-on

  • Istio provides a basic sample installation to quickly get Prometheus up and running
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.12/samples/addons/prometheus.yaml

STEP 5: Deploy Flagger

  • We will use Istio as our service mesh and Prometheus as our metric server
https://github.com/fluxcd/flagger/tree/main/charts/flagger
crd.createIf true, create Flagger's CRDs (should be enabled for Helm v2 only)
$ helm upgrade -i flagger flagger/flagger --namespace=istio-system --set crd.create=false --set meshProvider=istio --set metricsServer=http://prometheus.istio-system:9090

STEP 6: Deploy test application and add Istio sidecar

$ kubectl create ns test
$ kubectl label namespace test istio-injection=enabled
$ kubectl apply -k https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main
$ kubectl apply -k https://github.com/fluxcd/flagger//kustomize/tester?ref=main

STEP 7: Create Metric Template

  • Flagger will use the following metric, the metric will get all the 5XX errors that the podinfo-canary service received in the last 30 seconds. The metric output is a calculated percentage of the 5XX errors divided by to all the responses code.
Note: in most documentations the istio metric that’s been used is “request-success-rate” or “request-duration”, but recently istio changed the names of it’s metrics. The name of the new metrics are: “istio_requests_total” and “istio_request_duration_milliseconds_bucket”.
For more read about istio metrics: https://istio.io/latest/docs/reference/config/metrics/
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: error-rate
namespace: istio-system
spec:
provider:
address: http://prometheus.istio-system.svc.cluster.local:9090
type: prometheus
query: |
100 -
(sum(rate(istio_requests_total{destination_service="podinfo-canary.test.svc.cluster.local", response_code=~"5.*"}[30s]))
/
sum(rate(istio_requests_total{destination_service="podinfo-canary.test.svc.cluster.local"}[30s]))
* 100
)

STEP 8: Deploy Horizontal Pod Autoscaler

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
# scale up if usage is above
# 99% of the requested CPU (100m)
averageUtilization: 99apiVersion: flagger.app/v1beta1

STEP 9: Deploy the Canary for podinfo application

kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
service:
# service port number
port: 9898
# Istio traffic policy (optional)
trafficPolicy:
tls:
# use ISTIO_MUTUAL when mTLS is enabled
mode: DISABLE
analysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 100
# canary increment step
# percentage (0-100)
stepWeight: 5
metrics:
- name: "500 percentage"
templateRef:
name: error-rate
namespace: istio-system
thresholdRange:
min: 99
interval: 15s
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.test/
timeout: 30s
metadata:
type: bash
cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

Few things to note regarding the canary release:

  1. The service and the deployment that Flagger will take control.
# deployment reference
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
service:
# service port number
port: 9898

2. Configure How Flagger will route it’s traffic.

analysis:
# schedule interval (default 60s)
interval: 10s
# max number of failed metric checks before rollback
threshold: 5
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 100
# canary increment step
# percentage (0-100)
stepWeight: 5

3. The template metric that Flagger will look for it’s analysis

metrics:
- name: "500 percentage"
templateRef:
name: error-rate
namespace: istio-system
thresholdRange:
min: 99
interval: 15s

STEP 10: Trigger a canary deployment by updating the container image

$ kubectl -n test set image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.1
  • Describe the canary release to check for errors or advanced status
  • In the following picture we can see a successful release.

STEP 11: Check that the canary release will stop when generating error 500

  • Upgrade your image again
$ kubectl -n test set image deployment/podinfo \
podinfod=stefanprodan/podinfo:3.1.2
  • Generate error 500 — run from flagger-loadtester pod
$ kubectl -n test exec -it deployment/flagger-loadtester bash
$ watch -n 1 curl http://podinfo-canary:9898/status/500
  • If everything was configure correctly, Flagger will halt the advance of the traffic route to the canary release.
  • In this demo we can see that Flagger has reached the number of failed analysis that we set and preformed a Rolled Back.
Note: Flagger might be failing 3 times out of 5 through the whole cycle of the new release. In that case flagger will consider the new release as a successful release, and will fully upgrade to the new version.
Which means that when Flagger analysis fails, it halts the advance of the traffic routing, and so on until we reach the limit then it will preform a Roll Back. But if the next analysis will “pass” it will keep route the traffic as set in the canary release.

References:


Progressive Delivery: How To Implement Flagger with Istio was originally published in Everything Full Stack on Medium, where people are continuing the conversation by highlighting and responding to this story.

DevOps Engineer

DevOps Group

Thank you for your interest!

We will contact you as soon as possible.

Want to Know More?

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com
Thank you for your interest!

We will contact you as soon as possible.

Let's talk

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com