Canary Deployment
What is Canary Deployment?
Canary Deployment gradually rolls out changes to a small subset of users before deploying to everyone, reducing risk.
Architecture
┌─────────────┐
│Load Balancer│
└──────┬──────┘
│
┌───┴────────────┐
│ │
┌──▼──────┐ ┌───▼────┐
│ Stable │ │ Canary │
│ (90%) │ │ (10%) │
│ v1.0 │ │ v2.0 │
└─────────┘ └────────┘
Gradually increase canary trafficKubernetes Implementation
# Stable version (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
labels:
app: myapp
track: stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
track: stable
template:
metadata:
labels:
app: myapp
track: stable
spec:
containers:
- name: myapp
image: myapp:v1.0.0
ports:
- containerPort: 3000
---
# Canary version (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
labels:
app: myapp
track: canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
track: canary
template:
metadata:
labels:
app: myapp
track: canary
spec:
containers:
- name: myapp
image: myapp:v2.0.0
ports:
- containerPort: 3000
---
# Service (load balances across both)
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 3000
type: LoadBalancerGradual Rollout Script
#!/bin/bash
# Phase 1: 10% canary
kubectl scale deployment myapp-stable --replicas=9
kubectl scale deployment myapp-canary --replicas=1
echo "Canary at 10%"
sleep 300 # Monitor for 5 minutes
# Check metrics
ERROR_RATE=$(curl -s http://prometheus/api/v1/query?query=error_rate)
if [ $ERROR_RATE -gt 0.01 ]; then
echo "High error rate detected. Rolling back."
kubectl scale deployment myapp-canary --replicas=0
exit 1
fi
# Phase 2: 25% canary
kubectl scale deployment myapp-stable --replicas=6
kubectl scale deployment myapp-canary --replicas=2
echo "Canary at 25%"
sleep 300
# Phase 3: 50% canary
kubectl scale deployment myapp-stable --replicas=5
kubectl scale deployment myapp-canary --replicas=5
echo "Canary at 50%"
sleep 300
# Phase 4: 100% canary
kubectl scale deployment myapp-stable --replicas=0
kubectl scale deployment myapp-canary --replicas=10
echo "Canary at 100%"
# Promote canary to stable
kubectl delete deployment myapp-stable
kubectl label deployment myapp-canary track=stable --overwriteIstio Service Mesh
# VirtualService for traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- match:
- headers:
user-type:
exact: beta
route:
- destination:
host: myapp
subset: canary
- route:
- destination:
host: myapp
subset: stable
weight: 90
- destination:
host: myapp
subset: canary
weight: 10
---
# DestinationRule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
subsets:
- name: stable
labels:
track: stable
- name: canary
labels:
track: canaryGitHub Actions Canary
name: Canary Deployment
on:
push:
branches: [main]
jobs:
deploy-canary:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy canary (10%)
run: |
kubectl set image deployment/myapp-canary myapp=myapp:${{ github.sha }}
kubectl scale deployment myapp-canary --replicas=1
- name: Wait and monitor
run: |
sleep 300
./check-metrics.sh
- name: Increase to 25%
if: success()
run: |
kubectl scale deployment myapp-stable --replicas=6
kubectl scale deployment myapp-canary --replicas=2
sleep 300
./check-metrics.sh
- name: Increase to 50%
if: success()
run: |
kubectl scale deployment myapp-stable --replicas=5
kubectl scale deployment myapp-canary --replicas=5
sleep 300
./check-metrics.sh
- name: Full rollout
if: success()
run: |
kubectl set image deployment/myapp-stable myapp=myapp:${{ github.sha }}
kubectl scale deployment myapp-canary --replicas=0
- name: Rollback
if: failure()
run: |
kubectl scale deployment myapp-canary --replicas=0Monitoring
// Monitor canary metrics
async function monitorCanary() {
const metrics = await prometheus.query({
query: 'rate(http_requests_total{track="canary"}[5m])'
});
const errorRate = await prometheus.query({
query: 'rate(http_requests_total{track="canary",status=~"5.."}[5m])'
});
const latency = await prometheus.query({
query: 'histogram_quantile(0.95, http_request_duration_seconds{track="canary"})'
});
// Compare with stable
const stableErrorRate = await prometheus.query({
query: 'rate(http_requests_total{track="stable",status=~"5.."}[5m])'
});
if (errorRate > stableErrorRate * 1.5) {
console.log('Canary error rate too high. Rolling back.');
return false;
}
if (latency > 1000) {
console.log('Canary latency too high. Rolling back.');
return false;
}
return true;
}Feature Flags
// Combine with feature flags
class CanaryFeatureFlag {
isEnabled(userId) {
// Route beta users to canary
if (this.isBetaUser(userId)) {
return true;
}
// Route percentage to canary
const hash = this.hash(userId);
return (hash % 100) < this.canaryPercentage;
}
isBetaUser(userId) {
return betaUsers.includes(userId);
}
hash(str) {
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
}
return Math.abs(hash);
}
}Automated Rollback
# Flagger for automated canary
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: myapp
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
service:
port: 80
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1mBenefits
- Reduced risk: Gradual rollout
- Early detection: Find issues quickly
- Easy rollback: Affect fewer users
- A/B testing: Compare versions
Challenges
- Complexity: More infrastructure
- Monitoring: Need good metrics
- Duration: Slower than blue-green
- Stateful apps: Session handling
Interview Tips
- Explain pattern: Gradual rollout to subset
- Show implementation: Kubernetes scaling
- Demonstrate monitoring: Metrics comparison
- Discuss Istio: Traffic splitting
- Mention rollback: Automated based on metrics
- Show benefits: Risk reduction
Summary
Canary Deployment gradually rolls out changes to small percentage of users. Start with 10%, monitor metrics, increase to 25%, 50%, then 100%. Use Kubernetes replica scaling or Istio traffic splitting. Monitor error rates and latency. Automated rollback on issues. Reduces risk compared to big bang deployments.
Test Your Knowledge
Take a quick quiz to test your understanding of this topic.