Automated Canary Rollback

Now we will generate some failed requests to trigger an automated rollback.

During the canary analysis, you can generate HTTP 500 errors and high latency to verify that Flagger pauses and rolls back the faulty version.

Trigger another canary release:

cat << EOF | tee overlays/podinfo.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: podinfo
  namespace: demo
spec:
  template:
    spec:
      containers:
        - name: podinfod
          image: stefanprodan/podinfo:3.1.2
          env:
            - name: PODINFO_UI_LOGO
              value: https://eks.handson.flagger.dev/cuddle_bunny.gif
EOF

Push your changes and use fluxctl to sync:

git add -A && \
git commit -m "update podinfo" && \
git push origin master && \
fluxctl sync --k8s-fwd-ns flux

Watch the canaries:

kubectl -n demo get canaries --watch

View Flagger logs with:

kubectl -n appmesh-system logs deployment/flagger -f | jq .msg

Exec into the tester pod:

kubectl -n demo exec -it $(kubectl -n demo get pods -o name | grep -m1 flagger-loadtester | cut -d'/' -f 2) bash

Generate HTTP 500 errors:

hey -z 1m -c 5 -q 5 http://podinfo-canary.demo:9898/status/500 && \
hey -z 1m -c 5 -q 5 http://podinfo-canary.demo:9898/delay/1

When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary and the canary is scaled to zero.

View Flagger logs with:

kubectl -n appmesh-system logs deployment/flagger -f | jq .msg

You should see the following:

 Starting canary analysis for podinfo.prod
 Advance podinfo.test canary weight 5
 Advance podinfo.test canary weight 10
 Advance podinfo.test canary weight 15
 Halt podinfo.test advancement success rate 69.17% < 99%
 Halt podinfo.test advancement success rate 61.39% < 99%
 Halt podinfo.test advancement success rate 55.06% < 99%
 Halt podinfo.test advancement request duration 1.20s > 0.5s
 Halt podinfo.test advancement request duration 1.45s > 0.5s
 Rolling back podinfo.prod failed checks threshold reached 5
 Canary failed! Scaling down podinfo.test

You’ll see that your podinfo-primary pods are still up, but they are all versioned at 3.1.1.