This script walks through every feature in a logical order for a ~10 minute screen recording.
Recommended tool: Loom, QuickTime, or OBS Suggested layout: Split terminal (left) + browser (right)
Run these commands silently before you hit record:
# 1. Start cluster
minikube start --driver=docker --memory=4096 --cpus=2
# 2. Build and deploy
eval $(minikube docker-env)
docker build -t self-healing-app:latest ./app
helm install app-a ./helm-chart \
--set image.repository=self-healing-app \
--set image.tag=latest \
--set image.pullPolicy=Never
# 3. Install observability
./observability/install.sh
# 4. Build and deploy controller
docker build -t self-healing-controller:latest ./controller
kubectl apply -f controller/deploy.yaml
# 5. Pre-forward ports in background
kubectl port-forward svc/app-a 8080:80 &
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring &
kubectl port-forward svc/jaeger-query 16686:16686 -n monitoring &
# 6. Build v2 image for canary demo later
docker tag self-healing-app:latest self-healing-app:v2Open these tabs in your browser before recording:
http://localhost:3000— Grafana (admin / admin123)http://localhost:16686— Jaegerhttps://github.com/abhay1999/self-healing-k8s-platform— GitHub repo
Show: GitHub repo README Say: "This is a self-healing Kubernetes platform I built from scratch over 30 days. The core idea: in production, things break — pods crash, bad deployments happen, traffic spikes hit. Instead of waking up at 3am, this platform detects and recovers from failures automatically."
Show: The architecture Mermaid diagram in the README Highlight: The 7 recovery layers listed in the table
Show: Terminal Type and explain each command:
# Show the cluster is running
kubectl get nodes
# Show 2 pods running
kubectl get pods -o wide
# Show the app responding
curl http://localhost:8080/
curl http://localhost:8080/health
curl http://localhost:8080/metrics | head -20Say: "The app is a Go HTTP server. It exposes a /health endpoint for liveness probes, /ready for readiness probes, and /metrics for Prometheus scraping."
Show: Split view — kubectl get pods -w on left, terminal on right
# Left terminal
kubectl get pods -w
# Right terminal — trigger the crash endpoint
curl http://localhost:8080/crashPoint out: Pod goes Running → CrashLoopBackOff → Running within seconds
Say: "The liveness probe hits /health every 5 seconds. After 3 failures, Kubernetes automatically restarts the container. No human involved."
# Show restart count
kubectl get pods
# RESTARTS column shows 1Show: Controller logs in one pane, trigger more crashes in another
# Watch controller
kubectl logs -f deployment/self-healing-controller
# In another pane — crash the pod multiple times (repeat 4x)
curl http://localhost:8080/crash; sleep 3
curl http://localhost:8080/crash; sleep 3
curl http://localhost:8080/crash; sleep 3
curl http://localhost:8080/crashPoint out: When restarts exceed 3, the controller log shows:
[SELF-HEAL] Pod default/app-a-xxx container app has restarted 4 times — deleting pod
[SELF-HEAL] Deleted pod default/app-a-xxx — K8s will schedule a fresh replacement
Say: "This custom Go controller I wrote watches every pod in the cluster. When a pod is stuck in a restart loop beyond a threshold, it deletes it to force a completely fresh start — different from a simple container restart."
Show: Switch to Grafana browser tab
- Go to Dashboards → Kubernetes / Pods
- Point out: CPU, memory, restart count, pod count
- Open Explore, type query:
http_requests_total - Show the counter incrementing with each curl
Show: Switch to Jaeger tab
# Generate some traffic first
for i in $(seq 1 10); do curl -s http://localhost:8080/ > /dev/null; done- In Jaeger: Service =
self-healing-app→ Find Traces - Click a trace → show the span timeline
- Say: "Every single HTTP request is traced end-to-end with OpenTelemetry. In production you'd use this to debug latency spikes across microservices."
Show: Terminal side-by-side with kubectl get pods -w
# Left: watch pods
kubectl get pods -w
# Right: deploy a broken image
helm upgrade app-a ./helm-chart \
--set image.repository=self-healing-app \
--set image.tag=does-not-exist \
--set image.pullPolicy=NeverPoint out: Pods enter ErrImageNeverPull or CrashLoopBackOff
# Run the rollback script
./self-healing/auto-rollback.sh app-a 30Show:
helm history app-a
# REVISION 1: deployed (good)
# REVISION 2: failed
# REVISION 3: deployed (rollback)Say: "The auto-rollback script is also baked into the GitHub Actions pipeline. Every deployment is automatically health-checked, and if it finds crashed pods, it rolls back without any human action."
Show: Grafana dashboard open, two terminals side by side
# Terminal 1: continuous traffic (should never stop)
watch -n 0.5 'curl -s http://localhost:8080/ | jq .status'
# Terminal 2: kill a pod every 10 seconds
./chaos-testing/kill-pods.sh 10 app-aPoint out:
- Terminal 1 always shows
"ok"— zero dropped requests - Grafana shows restart count rising but request rate stays flat
Say: "This simulates a real failure scenario. Kubernetes reschedules replacement pods fast enough that the service never goes down. The PodDisruptionBudget also ensures this works safely during planned maintenance like node drains."
# Stop chaos
Ctrl+CShow: Terminal
# Deploy 10% canary traffic
./canary/deploy-canary.sh v2
# Show both stable and canary pods
kubectl get pods -l app=app-a -o wide# Prove the traffic split — run 20 requests
for i in $(seq 1 20); do curl -s http://localhost:8080/ | jq -r .service; done | sort | uniq -c
# 18 app-a ← stable
# 2 app-a-canary ← ~10% canarySay: "This is a real production pattern. You deploy the new version to 1 pod, verify it works on real traffic, then promote it. If something looks wrong in Grafana, you roll back with a single command."
# Show rollback
./canary/rollback-canary.sh
kubectl get pods -l app=app-a
# Only stable pods remainShow: Terminal
cd app && go test ./... -v
cd ../controller && go test ./... -vPoint out: All tests pass — green output Say: "The app handlers and the controller reconciler both have full unit test coverage. The controller tests use a fake Kubernetes client so they run without a real cluster."
Show: GitHub repo README, scroll through it
Say: "Everything you just saw — all the code, Helm charts, scripts, tests, and this README — is on GitHub. The platform covers: CI/CD, observability, 7 layers of self-healing, chaos engineering, and canary deployments. Built in 30 days using Go, Kubernetes, Helm, Prometheus, Grafana, Jaeger, and GitHub Actions."
- Use font size 16+ in terminal for readability
- Run
clearbetween scenes - Pause 1 second after typing a command before pressing Enter — gives viewers time to read
- If a command has long output, pipe to
| head -20or| tail -20 - Record at 1920×1080, export at 1080p
- Add chapter markers in the video description matching the scenes above
Self-Healing Kubernetes Platform | Go + Prometheus + Grafana + Jaeger + Canary Deployments
A production-grade self-healing Kubernetes platform I built from scratch in 30 days.
Features:
✅ Go microservice with Prometheus metrics + Jaeger distributed tracing
✅ Custom Go Kubernetes controller that auto-heals crash-looping pods
✅ GitHub Actions CI/CD with automated health checks + Helm rollback
✅ Chaos engineering — random pod killing with zero downtime
✅ Canary deployments with replica-ratio and Nginx exact-weight splitting
✅ PodDisruptionBudget for safe node maintenance
✅ Full unit test coverage with fake Kubernetes client
Code: https://github.com/abhay1999/self-healing-k8s-platform
Timestamps:
0:00 - Introduction
1:00 - Running the app
2:00 - Pod crash & self-healing
3:00 - Custom Go controller
4:00 - Observability (Grafana + Jaeger)
5:30 - Bad deploy + auto-rollback
7:00 - Chaos testing
8:00 - Canary deployment
9:30 - Unit tests
10:00 - Wrap-up