A real-time TUI monitoring tool for Kubernetes node and pod CPU/memory metrics, with special support for vcluster environments. Built with Rich library for terminal UI rendering.
- Automatic vCluster Detection: Checks
kubectl config current-contextfor "vcluster" string - Graceful Degradation: In vcluster environments where node metrics aren't available, displays "N/A" placeholders
- Context Switching:
- Connect to vcluster: Select a vcluster pod → Press ENTER → Choose "Connect to vCluster"
- Disconnect from vcluster: Press ESC twice (once to focus nodes, once to disconnect)
- All state is cleared on context switch to provide clean view
- Two-Panel View: Nodes (top) and Pods (bottom)
- Focus Switching: Press ENTER to toggle between nodes and pods
- Node Selection: Arrow keys or h/j/k/l in nodes panel
- Pod Selection: Arrow keys or j/k in pods panel
- Scrolling Window: Shows 20 pods at a time with automatic scrolling
- Trigger: Press
/in pods view to enter filter mode - Live Search: Results update as you type (non-blocking, async)
- Filter Behavior:
- ESC: Clear filter and exit input mode (fresh start)
- ENTER: Keep filter and exit input mode (results stay filtered)
- Ctrl+U: Clear entire filter text
- Backspace: Remove last character
- Search Scope: Searches both pod name and namespace
- Press ENTER on a pod to open action modal
- Standard Actions: Logs, Describe, Delete
- vCluster-Specific: "Connect to vCluster" appears if pod is a vcluster
- vCluster Detection Logic: Checks pod labels (
release,app.kubernetes.io/instance,vcluster.loft.sh/name) or parses pod name
- CPU History Graphs: Sparklines showing last 60 data points per node
- Pod Sorting: CPU, Memory, Status, Namespace, Restarts (cycle with ←/→ or h/l)
- Color Coding:
- Green: Running pods
- Yellow: Non-running states
- Red: Failed/Error states
- Dim yellow: vCluster limited metrics warning
# Selection & Focus
self.focused_panel: str # "nodes" or "pods"
self.selected_node_index: int # Current node selection (0 = "all")
self.selected_pod_index: int # Current pod selection
self.pod_scroll_offset: int # Scrolling position in pod list
# Filtering
self.pod_filter_text: str # Current filter text
self.show_filter_input: bool # Whether filter input is active
# Modal
self.show_pod_modal: bool # Whether action modal is shown
self.selected_action_index: int # Current action selection
# vCluster
self.is_vcluster: bool # Detected vcluster contextWhen connecting/disconnecting from vcluster, ALL of the following must be cleared:
- Metrics history (
cpu_history,pod_metrics,node_status,node_roles) - Render signatures (
_last_nodes_sig,_last_pods_sig,_last_controls_sig) - Selection state (
selected_node_index=0,selected_pod_index=0,pod_scroll_offset=0) - Filter state (
pod_filter_text="",show_filter_input=False) - Modal state (
show_pod_modal=False,selected_action_index=0) - Focus (
focused_panel="nodes")
Why? Different contexts have different nodes/pods. Keeping old state causes confusion like:
- Filter text from old context showing results from new context
- Selected node index pointing to non-existent node
- Scroll offset out of bounds
Avoid unnecessary re-renders by computing signatures of table data:
nodes_sig = (self.update_count, selected_node, self.is_vcluster)
if nodes_sig == self._last_nodes_sig:
return cached_table # Don't rebuildProblem: Blocking live.update() on each keystroke makes typing sluggish.
Solution: Set filter state and break to let main loop handle rendering:
if key in filter_input:
self.pod_filter_text += key
filter_changed = True
if filter_changed:
break # Main loop handles refresh asyncThis decouples input from rendering for responsive typing.
Filtering happens during table construction, not as a separate pre-processing step:
filtered = [
(p, cpu, ns, nd, status, restart, mem)
for (p, cpu, ns, nd, status, restart, mem) in self.pod_metrics
if (selected_node == "all" or nd == selected_node)
and (not self.pod_filter_text
or self.pod_filter_text.lower() in p.lower()
or self.pod_filter_text.lower() in ns.lower())
]vClusters use virtual nodes without real kubelet endpoints. The metrics-server inside a vcluster cannot scrape actual node metrics, so:
kubectl top nodesfails with "Metrics API not available"- Attempting to install metrics-server in vcluster doesn't help (nodes are virtual)
- Detection:
_detect_vcluster()checks if context name contains "vcluster" - Metrics Fallback: When
kubectl top nodesfails in vcluster:- Get node names from
kubectl get nodes -o json - Populate metrics with
0.0placeholders - Display "N/A" in UI for CPU/Memory columns
- Get node names from
- User Feedback:
- Title shows
[dim yellow](vcluster - limited metrics)[/dim yellow] - Warning in controls: "⚠ vCluster: Metrics API not available • Press ESC to disconnect"
- Title shows
Getting the correct vcluster name for vcluster connect is non-trivial:
- Check pod labels (in order of preference):
releaseapp.kubernetes.io/instancevcluster.loft.sh/name
- Parse pod name: Extract from pattern like
rancher-vcluster-0 - Fallback: Use namespace name
Why complex? Different vcluster deployments use different naming conventions (Rancher, Loft, vanilla vcluster).
Philosophy: Context switches should feel like restarting the tool in a new cluster.
Implementation: Clear ALL state (not just metrics) to avoid carrying over irrelevant selections/filters from previous context.
Philosophy: User typing should never be delayed by rendering.
Implementation: Input handlers set state variables and break, letting the main refresh loop handle UI updates asynchronously.
Philosophy: ESC should always move you to a "safer" or "broader" state.
Implementation:
- In filter mode: Clear filter (fresh start)
- In pod modal: Close modal
- In pods focus: Switch to nodes focus
- In nodes focus (vcluster): Disconnect to host cluster
- No Pod Metrics in vCluster: Pod metrics may also be limited in some vcluster configurations (depends on metrics-server setup)
- Terminal Size Dependency: Very small terminals may cause UI clipping
- Context Switch Delay: 1-second sleep after connect/disconnect to show user feedback before reload
- Large Pod Lists: Only first 200 pods used in signature calculation for performance
- Support for filtering by namespace (separate from pod name filter)
- History/sparklines for pod CPU usage (currently only nodes have graphs)
- Multi-cluster switching (not just vcluster, but other contexts)
- Resource limits/requests comparison (show pod requests vs actual usage)
- Export/snapshot feature (save current metrics to file)
- Custom refresh interval (currently hardcoded at 1 second)
- Color themes or configuration file
Check that show_filter_input flag is properly reset when switching contexts or pressing ESC.
Verify all state variables are cleared in the context switch handler. Grep for the variable name to find all usages.
Ensure no live.update() calls in the key handling loop for filter input - should only set state and break.
- Check output of
kubectl config current-context - Update
_detect_vcluster()method if your vcluster uses different naming
watch-node-cpu.py
├── NodeCPUWatcher (main class)
│ ├── __init__ # State initialization
│ ├── _detect_vcluster # vCluster detection
│ ├── _get_vcluster_name # Extract vcluster name from pod
│ ├── _is_vcluster_pod # Check if pod is a vcluster
│ ├── get_node_metrics # Fetch node CPU (with vcluster fallback)
│ ├── get_pod_metrics # Fetch pod CPU/memory
│ ├── format_table # Build nodes table UI
│ ├── format_pods_table # Build pods table UI
│ ├── create_pod_action_modal # Build action modal UI
│ └── run # Main loop (key handling + refresh)
rich: Terminal UI (Table, Panel, Layout, Live)sparklines: CPU history graphskubectl: All k8s API calls via subprocessvclusterCLI: For context switching
- Test in host cluster context
- Test in vcluster context
- Test context switching (host → vcluster → host)
- Test filter input (type, backspace, ESC, ENTER)
- Test pod selection and scrolling
- Test node selection
- Test sort mode cycling
- Test action modal (logs, describe, delete)
- Test vcluster detection with different vcluster types
- Test with very large pod lists (100+)
- Test with small terminal size