Skip to content

Hierarchical Queue Reclaim logic lacks recursive parent-level deserved check in capacity plugin #5107

@wangyang0616

Description

@wangyang0616

Description

Currently, the capacity plugin in Volcano supports hierarchical queues, but the resource reclamation (cross-queue preemption) logic seems to operate on a "flattened" view of leaf queues.

When reclaimFn is triggered, it primarily checks if a leaf queue's allocated resources exceed its deserved value. However, it fails to consider the status of the parent queue in the hierarchy during the reclamation process.

Code Reference
In pkg/scheduler/plugins/capacity/capacity.go (specifically within the reclaimFn logic), the current implementation only evaluates the immediate queue's resource usage against its pre-calculated deserved value:

Go
// Current simplified logic flow observed:

// Check deserved
if exceeds, dims, reason := cp.checkDeservedExceedance(
allocated, attr.deserved, reclaimee, reclaimer, attr.name); !exceeds {
klog.V(5).Infof("%s", reason)
continue

The Problem / Use Case
In a true hierarchical resource sharing model, a child queue should be allowed to "borrow" unused capacity from its parent.

Steps to reproduce the issue

Scenario:

Root Queue

Parent Queue A (Deserved: 100)

Child Queue A1 (Deserved: 20) -> Currently using 40 (Over-used but Parent A is still at 40/100)

Child Queue A2 (Deserved: 80) -> Currently using 0

Parent Queue B (Deserved: 10) -> Currently using 10, wants to reclaim more.

Current Behavior: If Queue B triggers a reclaim, Volcano might evict tasks from Child Queue A1 because A1.Allocated (40) > A1.Deserved (20).

Describe the results you received and expected

The scheduler should check if Parent Queue A is also over-used. Since Parent A.Allocated (40) < Parent A.Deserved (100), the surplus 20 units used by A1 should be considered "legally borrowed" from its sibling A2's idle quota within the same parent, and thus should not be reclaimed by Queue B until Parent A as a whole exceeds its quota.

What version of Volcano are you using?

master

Any other relevant information

Suggested Solution
Recursive Check: Modify the reclaimFn or the task candidate selection logic to recursively check the Allocated vs Deserved status up to the root (or until the common ancestor).

Hierarchical Awareness: Ensure that a task is only considered a candidate for cross-queue reclamation if both the leaf queue and all its ancestors (up to the point of contention) are over-limit.

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions