Skip to content

Conversation

@Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Jul 6, 2025

Pull Request Description

https://github.com/Jeffwan/aibrix/pull/new/jiaxin/update-stormservice-stability

1min reconcile time is too long and sometime we have to wait long time for deletion done. This is to reduce the reconcile time at this moment. this mainly affects the deletion experiences

Related Issues

Resolves: part of #1252

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Jeffwan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the operational stability and responsiveness of the system's controllers. It achieves this by standardizing and reducing the default reconciliation requeue interval across key controllers and by adjusting a specific resource burst limit, aiming for more consistent and controlled resource management.

Highlights

  • Reconciliation Timing Standardization: The DefaultRequeueAfter constant has been standardized to 10 * time.Second across both the RoleSet and StormService controllers. This change replaces a previous 1 * time.Minute setting in StormService and hardcoded 30 * time.Second values in RoleSet, promoting consistent and quicker re-reconciliation cycles.
  • Resource Burst Limit Adjustment: The PodBurst constant within the RoleSet controller has been reduced from 500 to 100. This adjustment may affect the rate at which the controller processes or creates pods, potentially improving stability under certain load conditions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve controller stability by reducing the DefaultRequeueAfter duration to 10 seconds for both the roleset and stormservice controllers. The changes correctly implement this by introducing and updating constants and applying them in the reconciliation logic.

My main feedback is regarding an undocumented change to the PodBurst value in the roleset controller. This is a significant modification with potential performance implications that should be explained in the pull request description or moved to a separate PR for clarity.

@Jeffwan Jeffwan force-pushed the jiaxin/update-stormservice-stability branch from 62de486 to ee15322 Compare July 6, 2025 03:21
@Jeffwan Jeffwan changed the title Change DefaultRequeueAfter to 10s Change DefaultRequeueAfter to 15s Jul 6, 2025
@Jeffwan Jeffwan force-pushed the jiaxin/update-stormservice-stability branch from ee15322 to 7154659 Compare July 6, 2025 09:25
@Jeffwan Jeffwan changed the title Change DefaultRequeueAfter to 15s Change stormservice controller DefaultRequeueAfter to 15s Jul 6, 2025
@Jeffwan Jeffwan changed the title Change stormservice controller DefaultRequeueAfter to 15s Update stormservice controller DefaultRequeueAfter to 15s Jul 6, 2025
@Jeffwan Jeffwan merged commit 31a3a4e into vllm-project:main Jul 6, 2025
14 checks passed
@Jeffwan Jeffwan deleted the jiaxin/update-stormservice-stability branch July 6, 2025 09:41
Yaegaki1Erika pushed a commit to Yaegaki1Erika/aibrix that referenced this pull request Jul 23, 2025
…ct#1253)

Change DefaultRequeueAfter to 15s

Signed-off-by: Jiaxin Shan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant