Skip to content

Commit 0a4414c

Browse files
abhay1999claude
andcommitted
Polish project presentation for portfolio
- Rewrite README: real-world problem statement, Mermaid architecture diagram, 7-layer self-healing table, full deployment guide, demo scenarios, correct GitHub badge URLs (abhay1999) - Add DEMO.md: 10-scene video recording script with exact commands, timing, narration cues, suggested title/description/timestamps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent e9726be commit 0a4414c

File tree

2 files changed

+559
-244
lines changed

2 files changed

+559
-244
lines changed

DEMO.md

Lines changed: 303 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,303 @@
1+
# Demo Recording Script
2+
3+
This script walks through every feature in a logical order for a ~10 minute screen recording.
4+
5+
**Recommended tool:** [Loom](https://loom.com), QuickTime, or OBS
6+
**Suggested layout:** Split terminal (left) + browser (right)
7+
8+
---
9+
10+
## Setup Before Recording
11+
12+
Run these commands silently before you hit record:
13+
14+
```bash
15+
# 1. Start cluster
16+
minikube start --driver=docker --memory=4096 --cpus=2
17+
18+
# 2. Build and deploy
19+
eval $(minikube docker-env)
20+
docker build -t self-healing-app:latest ./app
21+
helm install app-a ./helm-chart \
22+
--set image.repository=self-healing-app \
23+
--set image.tag=latest \
24+
--set image.pullPolicy=Never
25+
26+
# 3. Install observability
27+
./observability/install.sh
28+
29+
# 4. Build and deploy controller
30+
docker build -t self-healing-controller:latest ./controller
31+
kubectl apply -f controller/deploy.yaml
32+
33+
# 5. Pre-forward ports in background
34+
kubectl port-forward svc/app-a 8080:80 &
35+
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring &
36+
kubectl port-forward svc/jaeger-query 16686:16686 -n monitoring &
37+
38+
# 6. Build v2 image for canary demo later
39+
docker tag self-healing-app:latest self-healing-app:v2
40+
```
41+
42+
Open these tabs in your browser before recording:
43+
- `http://localhost:3000` — Grafana (admin / admin123)
44+
- `http://localhost:16686` — Jaeger
45+
- `https://github.com/abhay1999/self-healing-k8s-platform` — GitHub repo
46+
47+
---
48+
49+
## Scene 1 — Introduction (1 min)
50+
51+
**Show:** GitHub repo README
52+
**Say:** "This is a self-healing Kubernetes platform I built from scratch over 30 days. The core idea: in production, things break — pods crash, bad deployments happen, traffic spikes hit. Instead of waking up at 3am, this platform detects and recovers from failures automatically."
53+
54+
**Show:** The architecture Mermaid diagram in the README
55+
**Highlight:** The 7 recovery layers listed in the table
56+
57+
---
58+
59+
## Scene 2 — The Running App (1 min)
60+
61+
**Show:** Terminal
62+
**Type and explain each command:**
63+
64+
```bash
65+
# Show the cluster is running
66+
kubectl get nodes
67+
68+
# Show 2 pods running
69+
kubectl get pods -o wide
70+
71+
# Show the app responding
72+
curl http://localhost:8080/
73+
curl http://localhost:8080/health
74+
curl http://localhost:8080/metrics | head -20
75+
```
76+
77+
**Say:** "The app is a Go HTTP server. It exposes a `/health` endpoint for liveness probes, `/ready` for readiness probes, and `/metrics` for Prometheus scraping."
78+
79+
---
80+
81+
## Scene 3 — Self-Healing: Pod Crash (2 min)
82+
83+
**Show:** Split view — `kubectl get pods -w` on left, terminal on right
84+
85+
```bash
86+
# Left terminal
87+
kubectl get pods -w
88+
89+
# Right terminal — trigger the crash endpoint
90+
curl http://localhost:8080/crash
91+
```
92+
93+
**Point out:** Pod goes `Running → CrashLoopBackOff → Running` within seconds
94+
**Say:** "The liveness probe hits `/health` every 5 seconds. After 3 failures, Kubernetes automatically restarts the container. No human involved."
95+
96+
```bash
97+
# Show restart count
98+
kubectl get pods
99+
# RESTARTS column shows 1
100+
```
101+
102+
---
103+
104+
## Scene 4 — Custom Controller Healing (1 min)
105+
106+
**Show:** Controller logs in one pane, trigger more crashes in another
107+
108+
```bash
109+
# Watch controller
110+
kubectl logs -f deployment/self-healing-controller
111+
112+
# In another pane — crash the pod multiple times (repeat 4x)
113+
curl http://localhost:8080/crash; sleep 3
114+
curl http://localhost:8080/crash; sleep 3
115+
curl http://localhost:8080/crash; sleep 3
116+
curl http://localhost:8080/crash
117+
```
118+
119+
**Point out:** When restarts exceed 3, the controller log shows:
120+
```
121+
[SELF-HEAL] Pod default/app-a-xxx container app has restarted 4 times — deleting pod
122+
[SELF-HEAL] Deleted pod default/app-a-xxx — K8s will schedule a fresh replacement
123+
```
124+
125+
**Say:** "This custom Go controller I wrote watches every pod in the cluster. When a pod is stuck in a restart loop beyond a threshold, it deletes it to force a completely fresh start — different from a simple container restart."
126+
127+
---
128+
129+
## Scene 5 — Observability (1.5 min)
130+
131+
**Show:** Switch to Grafana browser tab
132+
133+
- Go to **Dashboards → Kubernetes / Pods**
134+
- Point out: CPU, memory, restart count, pod count
135+
- Open **Explore**, type query: `http_requests_total`
136+
- Show the counter incrementing with each curl
137+
138+
**Show:** Switch to Jaeger tab
139+
140+
```bash
141+
# Generate some traffic first
142+
for i in $(seq 1 10); do curl -s http://localhost:8080/ > /dev/null; done
143+
```
144+
145+
- In Jaeger: Service = `self-healing-app` → Find Traces
146+
- Click a trace → show the span timeline
147+
- **Say:** "Every single HTTP request is traced end-to-end with OpenTelemetry. In production you'd use this to debug latency spikes across microservices."
148+
149+
---
150+
151+
## Scene 6 — Bad Deploy + Auto-Rollback (1.5 min)
152+
153+
**Show:** Terminal side-by-side with `kubectl get pods -w`
154+
155+
```bash
156+
# Left: watch pods
157+
kubectl get pods -w
158+
159+
# Right: deploy a broken image
160+
helm upgrade app-a ./helm-chart \
161+
--set image.repository=self-healing-app \
162+
--set image.tag=does-not-exist \
163+
--set image.pullPolicy=Never
164+
```
165+
166+
**Point out:** Pods enter `ErrImageNeverPull` or `CrashLoopBackOff`
167+
168+
```bash
169+
# Run the rollback script
170+
./self-healing/auto-rollback.sh app-a 30
171+
```
172+
173+
**Show:**
174+
```bash
175+
helm history app-a
176+
# REVISION 1: deployed (good)
177+
# REVISION 2: failed
178+
# REVISION 3: deployed (rollback)
179+
```
180+
181+
**Say:** "The auto-rollback script is also baked into the GitHub Actions pipeline. Every deployment is automatically health-checked, and if it finds crashed pods, it rolls back without any human action."
182+
183+
---
184+
185+
## Scene 7 — Chaos Testing (1 min)
186+
187+
**Show:** Grafana dashboard open, two terminals side by side
188+
189+
```bash
190+
# Terminal 1: continuous traffic (should never stop)
191+
watch -n 0.5 'curl -s http://localhost:8080/ | jq .status'
192+
193+
# Terminal 2: kill a pod every 10 seconds
194+
./chaos-testing/kill-pods.sh 10 app-a
195+
```
196+
197+
**Point out:**
198+
- Terminal 1 always shows `"ok"` — zero dropped requests
199+
- Grafana shows restart count rising but request rate stays flat
200+
201+
**Say:** "This simulates a real failure scenario. Kubernetes reschedules replacement pods fast enough that the service never goes down. The PodDisruptionBudget also ensures this works safely during planned maintenance like node drains."
202+
203+
```bash
204+
# Stop chaos
205+
Ctrl+C
206+
```
207+
208+
---
209+
210+
## Scene 8 — Canary Deployment (1.5 min)
211+
212+
**Show:** Terminal
213+
214+
```bash
215+
# Deploy 10% canary traffic
216+
./canary/deploy-canary.sh v2
217+
218+
# Show both stable and canary pods
219+
kubectl get pods -l app=app-a -o wide
220+
```
221+
222+
```bash
223+
# Prove the traffic split — run 20 requests
224+
for i in $(seq 1 20); do curl -s http://localhost:8080/ | jq -r .service; done | sort | uniq -c
225+
# 18 app-a ← stable
226+
# 2 app-a-canary ← ~10% canary
227+
```
228+
229+
**Say:** "This is a real production pattern. You deploy the new version to 1 pod, verify it works on real traffic, then promote it. If something looks wrong in Grafana, you roll back with a single command."
230+
231+
```bash
232+
# Show rollback
233+
./canary/rollback-canary.sh
234+
235+
kubectl get pods -l app=app-a
236+
# Only stable pods remain
237+
```
238+
239+
---
240+
241+
## Scene 9 — Unit Tests (30 sec)
242+
243+
**Show:** Terminal
244+
245+
```bash
246+
cd app && go test ./... -v
247+
cd ../controller && go test ./... -v
248+
```
249+
250+
**Point out:** All tests pass — green output
251+
**Say:** "The app handlers and the controller reconciler both have full unit test coverage. The controller tests use a fake Kubernetes client so they run without a real cluster."
252+
253+
---
254+
255+
## Scene 10 — Wrap-up (30 sec)
256+
257+
**Show:** GitHub repo README, scroll through it
258+
259+
**Say:** "Everything you just saw — all the code, Helm charts, scripts, tests, and this README — is on GitHub. The platform covers: CI/CD, observability, 7 layers of self-healing, chaos engineering, and canary deployments. Built in 30 days using Go, Kubernetes, Helm, Prometheus, Grafana, Jaeger, and GitHub Actions."
260+
261+
---
262+
263+
## Recording Tips
264+
265+
- Use font size 16+ in terminal for readability
266+
- Run `clear` between scenes
267+
- Pause 1 second after typing a command before pressing Enter — gives viewers time to read
268+
- If a command has long output, pipe to `| head -20` or `| tail -20`
269+
- Record at 1920×1080, export at 1080p
270+
- Add chapter markers in the video description matching the scenes above
271+
272+
## Suggested Video Title
273+
274+
`Self-Healing Kubernetes Platform | Go + Prometheus + Grafana + Jaeger + Canary Deployments`
275+
276+
## Suggested Description
277+
278+
```
279+
A production-grade self-healing Kubernetes platform I built from scratch in 30 days.
280+
281+
Features:
282+
✅ Go microservice with Prometheus metrics + Jaeger distributed tracing
283+
✅ Custom Go Kubernetes controller that auto-heals crash-looping pods
284+
✅ GitHub Actions CI/CD with automated health checks + Helm rollback
285+
✅ Chaos engineering — random pod killing with zero downtime
286+
✅ Canary deployments with replica-ratio and Nginx exact-weight splitting
287+
✅ PodDisruptionBudget for safe node maintenance
288+
✅ Full unit test coverage with fake Kubernetes client
289+
290+
Code: https://github.com/abhay1999/self-healing-k8s-platform
291+
292+
Timestamps:
293+
0:00 - Introduction
294+
1:00 - Running the app
295+
2:00 - Pod crash & self-healing
296+
3:00 - Custom Go controller
297+
4:00 - Observability (Grafana + Jaeger)
298+
5:30 - Bad deploy + auto-rollback
299+
7:00 - Chaos testing
300+
8:00 - Canary deployment
301+
9:30 - Unit tests
302+
10:00 - Wrap-up
303+
```

0 commit comments

Comments
 (0)