watch-node-cpu.py - Context & Design Documentation

Overview

A real-time TUI monitoring tool for Kubernetes node and pod CPU/memory metrics, with special support for vcluster environments. Built with Rich library for terminal UI rendering.

Key Features

1. Dual-Context Support (Host & vCluster)

Automatic vCluster Detection: Checks kubectl config current-context for "vcluster" string
Graceful Degradation: In vcluster environments where node metrics aren't available, displays "N/A" placeholders
Context Switching:
- Connect to vcluster: Select a vcluster pod → Press ENTER → Choose "Connect to vCluster"
- Disconnect from vcluster: Press ESC twice (once to focus nodes, once to disconnect)
- All state is cleared on context switch to provide clean view

2. Interactive Navigation

Two-Panel View: Nodes (top) and Pods (bottom)
Focus Switching: Press ENTER to toggle between nodes and pods
Node Selection: Arrow keys or h/j/k/l in nodes panel
Pod Selection: Arrow keys or j/k in pods panel
Scrolling Window: Shows 20 pods at a time with automatic scrolling

3. Pod Filtering

Trigger: Press / in pods view to enter filter mode
Live Search: Results update as you type (non-blocking, async)
Filter Behavior:
- ESC: Clear filter and exit input mode (fresh start)
- ENTER: Keep filter and exit input mode (results stay filtered)
- Ctrl+U: Clear entire filter text
- Backspace: Remove last character
Search Scope: Searches both pod name and namespace

4. Pod Actions Modal

Press ENTER on a pod to open action modal
Standard Actions: Logs, Describe, Delete
vCluster-Specific: "Connect to vCluster" appears if pod is a vcluster
vCluster Detection Logic: Checks pod labels (release, app.kubernetes.io/instance, vcluster.loft.sh/name) or parses pod name

5. Metrics Display

CPU History Graphs: Sparklines showing last 60 data points per node
Pod Sorting: CPU, Memory, Status, Namespace, Restarts (cycle with ←/→ or h/l)
Color Coding:
- Green: Running pods
- Yellow: Non-running states
- Red: Failed/Error states
- Dim yellow: vCluster limited metrics warning

Architecture & State Management

Core State Variables

# Selection & Focus
self.focused_panel: str           # "nodes" or "pods"
self.selected_node_index: int     # Current node selection (0 = "all")
self.selected_pod_index: int      # Current pod selection
self.pod_scroll_offset: int       # Scrolling position in pod list

# Filtering
self.pod_filter_text: str         # Current filter text
self.show_filter_input: bool      # Whether filter input is active

# Modal
self.show_pod_modal: bool         # Whether action modal is shown
self.selected_action_index: int   # Current action selection

# vCluster
self.is_vcluster: bool            # Detected vcluster context

Important: State Reset on Context Switch

When connecting/disconnecting from vcluster, ALL of the following must be cleared:

Metrics history (cpu_history, pod_metrics, node_status, node_roles)
Render signatures (_last_nodes_sig, _last_pods_sig, _last_controls_sig)
Selection state (selected_node_index=0, selected_pod_index=0, pod_scroll_offset=0)
Filter state (pod_filter_text="", show_filter_input=False)
Modal state (show_pod_modal=False, selected_action_index=0)
Focus (focused_panel="nodes")

Why? Different contexts have different nodes/pods. Keeping old state causes confusion like:

Filter text from old context showing results from new context
Selected node index pointing to non-existent node
Scroll offset out of bounds

Performance Optimizations

1. Signature-Based Caching

Avoid unnecessary re-renders by computing signatures of table data:

nodes_sig = (self.update_count, selected_node, self.is_vcluster)
if nodes_sig == self._last_nodes_sig:
    return cached_table  # Don't rebuild

2. Async Filter Input

Problem: Blocking live.update() on each keystroke makes typing sluggish.

Solution: Set filter state and break to let main loop handle rendering:

if key in filter_input:
    self.pod_filter_text += key
    filter_changed = True

if filter_changed:
    break  # Main loop handles refresh async

This decouples input from rendering for responsive typing.

3. Pod Filtering in Layout Build

Filtering happens during table construction, not as a separate pre-processing step:

filtered = [
    (p, cpu, ns, nd, status, restart, mem)
    for (p, cpu, ns, nd, status, restart, mem) in self.pod_metrics
    if (selected_node == "all" or nd == selected_node)
    and (not self.pod_filter_text 
         or self.pod_filter_text.lower() in p.lower()
         or self.pod_filter_text.lower() in ns.lower())
]

vCluster-Specific Implementation Details

Why No Node Metrics in vCluster?

vClusters use virtual nodes without real kubelet endpoints. The metrics-server inside a vcluster cannot scrape actual node metrics, so:

kubectl top nodes fails with "Metrics API not available"
Attempting to install metrics-server in vcluster doesn't help (nodes are virtual)

Solution: Graceful Degradation

Detection: _detect_vcluster() checks if context name contains "vcluster"
Metrics Fallback: When kubectl top nodes fails in vcluster:
- Get node names from kubectl get nodes -o json
- Populate metrics with 0.0 placeholders
- Display "N/A" in UI for CPU/Memory columns
User Feedback:
- Title shows [dim yellow](vcluster - limited metrics)[/dim yellow]
- Warning in controls: "⚠ vCluster: Metrics API not available • Press ESC to disconnect"

vCluster Name Extraction

Getting the correct vcluster name for vcluster connect is non-trivial:

Check pod labels (in order of preference):
- release
- app.kubernetes.io/instance
- vcluster.loft.sh/name
Parse pod name: Extract from pattern like rancher-vcluster-0
Fallback: Use namespace name

Why complex? Different vcluster deployments use different naming conventions (Rancher, Loft, vanilla vcluster).

Key User Experience Patterns

Pattern: "Fresh Start" on Context Switch

Philosophy: Context switches should feel like restarting the tool in a new cluster.

Implementation: Clear ALL state (not just metrics) to avoid carrying over irrelevant selections/filters from previous context.

Pattern: Non-Blocking Input

Philosophy: User typing should never be delayed by rendering.

Implementation: Input handlers set state variables and break, letting the main refresh loop handle UI updates asynchronously.

Pattern: "Escape to Safety"

Philosophy: ESC should always move you to a "safer" or "broader" state.

Implementation:

In filter mode: Clear filter (fresh start)
In pod modal: Close modal
In pods focus: Switch to nodes focus
In nodes focus (vcluster): Disconnect to host cluster

Known Limitations

No Pod Metrics in vCluster: Pod metrics may also be limited in some vcluster configurations (depends on metrics-server setup)
Terminal Size Dependency: Very small terminals may cause UI clipping
Context Switch Delay: 1-second sleep after connect/disconnect to show user feedback before reload
Large Pod Lists: Only first 200 pods used in signature calculation for performance

Future Enhancement Ideas

Support for filtering by namespace (separate from pod name filter)
History/sparklines for pod CPU usage (currently only nodes have graphs)
Multi-cluster switching (not just vcluster, but other contexts)
Resource limits/requests comparison (show pod requests vs actual usage)
Export/snapshot feature (save current metrics to file)
Custom refresh interval (currently hardcoded at 1 second)
Color themes or configuration file

Debugging Tips

If Filter Feels "Sticky"

Check that show_filter_input flag is properly reset when switching contexts or pressing ESC.

If Context Switch Shows Wrong Data

Verify all state variables are cleared in the context switch handler. Grep for the variable name to find all usages.

If Typing is Slow in Filter Mode

Ensure no live.update() calls in the key handling loop for filter input - should only set state and break.

If vCluster Not Detected

Check output of kubectl config current-context
Update _detect_vcluster() method if your vcluster uses different naming

Code Structure

watch-node-cpu.py
├── NodeCPUWatcher (main class)
│   ├── __init__              # State initialization
│   ├── _detect_vcluster      # vCluster detection
│   ├── _get_vcluster_name    # Extract vcluster name from pod
│   ├── _is_vcluster_pod      # Check if pod is a vcluster
│   ├── get_node_metrics      # Fetch node CPU (with vcluster fallback)
│   ├── get_pod_metrics       # Fetch pod CPU/memory
│   ├── format_table          # Build nodes table UI
│   ├── format_pods_table     # Build pods table UI
│   ├── create_pod_action_modal  # Build action modal UI
│   └── run                   # Main loop (key handling + refresh)

Dependencies

rich: Terminal UI (Table, Panel, Layout, Live)
sparklines: CPU history graphs
kubectl: All k8s API calls via subprocess
vcluster CLI: For context switching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

watch-node-cpu.py - Context & Design Documentation

Overview

Key Features

1. Dual-Context Support (Host & vCluster)

2. Interactive Navigation

3. Pod Filtering

4. Pod Actions Modal

5. Metrics Display

Architecture & State Management

Core State Variables

Important: State Reset on Context Switch

Performance Optimizations

1. Signature-Based Caching

2. Async Filter Input

3. Pod Filtering in Layout Build

vCluster-Specific Implementation Details

Why No Node Metrics in vCluster?

Solution: Graceful Degradation

vCluster Name Extraction

Key User Experience Patterns

Pattern: "Fresh Start" on Context Switch

Pattern: Non-Blocking Input

Pattern: "Escape to Safety"

Known Limitations

Future Enhancement Ideas

Debugging Tips

If Filter Feels "Sticky"

If Context Switch Shows Wrong Data

If Typing is Slow in Filter Mode

If vCluster Not Detected

Code Structure

Dependencies

Testing Checklist for Changes

FilesExpand file tree

watch-node-cpu-CONTEXT.md

Latest commit

History

watch-node-cpu-CONTEXT.md

File metadata and controls

watch-node-cpu.py - Context & Design Documentation

Overview

Key Features

1. Dual-Context Support (Host & vCluster)

2. Interactive Navigation

3. Pod Filtering

4. Pod Actions Modal

5. Metrics Display

Architecture & State Management

Core State Variables

Important: State Reset on Context Switch

Performance Optimizations

1. Signature-Based Caching

2. Async Filter Input

3. Pod Filtering in Layout Build

vCluster-Specific Implementation Details

Why No Node Metrics in vCluster?

Solution: Graceful Degradation

vCluster Name Extraction

Key User Experience Patterns

Pattern: "Fresh Start" on Context Switch

Pattern: Non-Blocking Input

Pattern: "Escape to Safety"

Known Limitations

Future Enhancement Ideas

Debugging Tips

If Filter Feels "Sticky"

If Context Switch Shows Wrong Data

If Typing is Slow in Filter Mode

If vCluster Not Detected

Code Structure

Dependencies

Testing Checklist for Changes