Description
When Volcano is installed after a Kubernetes cluster is already running and nodes have been labeled with topology information (e.g., volcano.sh/hypernode), the Volcano controller attempts to create HyperNode CRs based on those labels during its initial sync. However, if the Volcano admission webhook is not yet ready (e.g., still starting up or not scheduled), the HyperNode creation request is rejected by the API server, and the controller does not retry. This leads to missing HyperNode resources permanently, breaking topology-aware scheduling features that depend on them.
Steps to reproduce the issue
- Set up a Kubernetes cluster with worker nodes already labeled:
kubectl label node <node-name> volcano.sh/hypernode=hypernode-a
- Install Volcano (e.g., via Helm or YAML manifests) after labeling.
- Observe controller logs:
"Failed to create HyperNode" err="Internal error occurred: failed calling webhook "\ validatehypernodes.volcano.sh": failed to call webhook: Post \https://volcano-admission-service.kube-system.svc:443/hypernodes/validate?timeout=10s\": dial tcp xxx:443: connect: connection refused" name="hypernode-testtopology-tier2-pm4z2"
- Wait several minutes and check HyperNode resources:
kubectl get hypernodes
Describe the results you received and expected
After Volcano is fully installed, HyperNode CRs should be created for all topology domains present on nodes.
What version of Volcano are you using?
v1.13
Any other relevant information
No response
Description
When Volcano is installed after a Kubernetes cluster is already running and nodes have been labeled with topology information (e.g., volcano.sh/hypernode), the Volcano controller attempts to create HyperNode CRs based on those labels during its initial sync. However, if the Volcano admission webhook is not yet ready (e.g., still starting up or not scheduled), the HyperNode creation request is rejected by the API server, and the controller does not retry. This leads to missing HyperNode resources permanently, breaking topology-aware scheduling features that depend on them.
Steps to reproduce the issue
kubectl label node <node-name> volcano.sh/hypernode=hypernode-a"Failed to create HyperNode" err="Internal error occurred: failed calling webhook "\ validatehypernodes.volcano.sh": failed to call webhook: Post \https://volcano-admission-service.kube-system.svc:443/hypernodes/validate?timeout=10s\": dial tcp xxx:443: connect: connection refused" name="hypernode-testtopology-tier2-pm4z2"
kubectl get hypernodesDescribe the results you received and expected
After Volcano is fully installed, HyperNode CRs should be created for all topology domains present on nodes.
What version of Volcano are you using?
v1.13
Any other relevant information
No response