fix(reclaim): isolate evictions per node to prevent ghost evictions#5083
fix(reclaim): isolate evictions per node to prevent ghost evictions#5083Aman-Cool wants to merge 1 commit intovolcano-sh:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
ef3cf45 to
fbad6ec
Compare
There was a problem hiding this comment.
Code Review
The pull request successfully addresses the 'ghost-eviction' bug in reclaimForTask by introducing a temporary nodeStmt for each node attempt. This ensures that eviction operations are isolated and only committed if the task is successfully pipelined to the respective node, correctly managing their lifecycle using temporary Statement objects. The implementation is consistent with the pattern used in normalPreempt and is thoroughly validated by a new, detailed regression test case. No critical, high, or medium severity vulnerabilities were detected in the modified code. These changes enhance the correctness and reliability of the reclaim logic.
fbad6ec to
e568ccf
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a reclaim-path bug where eviction operations could “leak” across node attempts, causing pods on earlier (failed) nodes to be evicted when reclaim ultimately succeeds on a later node.
Changes:
- Introduce a per-node temporary
Statement(nodeStmt) inreclaimForTask, discarding it on failure and merging it into the outer statement only on success. - Add a regression test intended to ensure only victims from the successful node are evicted.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/scheduler/actions/reclaim/reclaim.go | Isolates evictions/pipeline ops per node attempt via nodeStmt, merging only on successful pipeline. |
| pkg/scheduler/actions/reclaim/reclaim_test.go | Adds a regression test case for preventing “ghost evictions” in reclaim. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // n1: 1 CPU total, fully used by victim-n1 (1 CPU, q1, preemptable). | ||
| // FutureIdle=0; 0+1=1 < 2 → ValidateVictims fails → n1 skipped entirely. | ||
| // n2: 3 CPU total, 1 CPU idle + victim-n2 (2 CPU, q1, preemptable). | ||
| // FutureIdle=1; 1+2=3 ≥ 2 → ValidateVictims passes → reclaim succeeds. | ||
| // | ||
| // Expected: only victim-n2 is evicted. victim-n1 must NOT be evicted. | ||
| // | ||
| // Before the nodeStmt fix, evictions went directly into the outer Statement | ||
| // without per-node isolation, so evictions from nodes that are later skipped | ||
| // or fail could leak into stmt.Commit(). The nodeStmt pattern prevents this. |
There was a problem hiding this comment.
This regression test’s setup has n1 failing ValidateVictims before any eviction is attempted, so it won’t reproduce the historical “ghost eviction” leak (which requires evictions to be recorded for a node attempt that later fails/gets skipped). To actually validate the per-node statement isolation, make the first node attempt perform at least one successful Evict and then fail to pipeline (e.g., via a small test plugin/event handler that injects an AllocateFunc error for n1) before succeeding on n2, and assert only n2’s victim is committed.
53d6343 to
16576a4
Compare
16576a4 to
4491921
Compare
d59c822 to
204b632
Compare
|
@hajnalmt @hzxuzhonghu @JesseStutler,I noticed reclaimForTask had the same ghost-eviction issue previously fixed in normalPreempt, so this PR applies the same per-node eviction pattern here. The change is fairly small, would really appreciate a quick review when you get a moment. Thanks! |
Use a temporary nodeStmt per node attempt in reclaimForTask so that evictions are only committed when reclaim succeeds on that node. On failure the nodeStmt is discarded, preventing ghost evictions of victims from nodes that were never used. Add regression test verifying only the successful node's victims are evicted when multiple nodes are attempted. Signed-off-by: Aman-Cool <aman017102007@gmail.com>
9ed48c2 to
43fdb0d
Compare
|
Was working on this in parallel🥲, #5067 got merged while I had this in progress. Closing. |
What type of PR is this?
Bug fix
What this PR does / why we need it:
reclaimForTaskhad the same ghost-eviction bug already fixed innormalPreempt.When reclaim iterates over multiple nodes, evictions were written directly into the
shared statement, so if node A's victims weren't sufficient and we fell through to
node B, both A's and B's evictions would be committed on success, killing pods
unrelated to the final scheduling decision.
The fix mirrors the pattern from
normalPreempt: allocate a temporarynodeStmtper node attempt, discard it if that node doesn't pan out, and merge it into the
outer statement only when the task is actually pipelined to that node.
Which issue(s) this PR fixes:
NONE
Special notes for your reviewer:
The fix is intentionally the same pattern used in
normalPreempt— worthcross-referencing that code path when reviewing. A regression test is included
that sets up two nodes where only one can satisfy the preemptor, asserting only
that node's victim gets evicted.
-->