Skip to content

Conversation

@Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented Oct 2, 2025

Pull Request Description

At this moment, user should be able to create two scaling rules. 1 for prefill role and 1 for decode role
The tricky part is stormservice role is not a CR level resource, which can not support /scale interface now. We can not use scale to update it. Instead, we adjust directly update the stormservice object

In future, we will support single autoscaler object to update entire stormservice. proportional scaling will be supported as well.

    apiVersion: autoscaling.aibrix.ai/v1alpha1
    kind: PodAutoscaler
    metadata:
      name: llm-prefill-autoscaler
    spec:
      scaleTargetRef:
        apiVersion: orchestration.aibrix.ai/v1alpha1
        kind: StormService
        name: llm-xpyd
      # Select the prefill role 
      subTargetSelector: # new added
        roleName: prefill # new added. 
      minReplicas: 2
      maxReplicas: 20
      scalingStrategy: APA
      metricsSources:
        - metricSourceType: pod
          protocolType: http
          port: "8000"
          path: /metrics
          targetMetric: "prefill_queue_length"
          targetValue: "10"

Related Issues

Resolves: part of #1260

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Jeffwan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the autoscaling capabilities within the system by introducing support for fine-grained, role-level autoscaling for StormService resources. Previously, individual roles within a StormService could not be scaled independently using standard Kubernetes scaling mechanisms. This change addresses that limitation by extending the PodAutoscaler API with a subTargetSelector field, allowing users to define distinct autoscaling policies for specific roles like 'prefill' or 'decode'. The core autoscaler controller has been refactored to accommodate this new functionality, abstracting scaling logic into a dedicated WorkloadScale component that can interact with both standard Kubernetes /scale subresources and directly manipulate StormService objects for role-specific replica adjustments. This provides greater flexibility and efficiency for managing resource allocation in complex, multi-role services.

Highlights

  • Role-Level Autoscaling for StormService: Introduced the "subTargetSelector" field in the PodAutoscaler API, allowing independent autoscaling of specific roles (e.g., prefill, decode) within a StormService resource, particularly in pooled mode.
  • API and CRD Extension: Extended the PodAutoscalerSpec with a new SubTargetSelector type, which includes a RoleName field, and updated the corresponding Custom Resource Definition (CRD) to support this new API.
  • Refactored Scaling Logic: Centralized and abstracted the scaling operations for different workload types (including standard Kubernetes resources and StormService roles) into a new WorkloadScale interface and its implementation, improving modularity and maintainability of the autoscaler controller.
  • Enhanced RBAC Permissions: Added necessary Role-Based Access Control (RBAC) rules to the controller to enable it to get, patch, and update /scale subresources for various apps workloads (Deployments, StatefulSets, ReplicaSets) and directly patch StormService objects for role-level scaling.
  • Comprehensive Documentation: Included detailed documentation explaining how to configure and utilize StormService role-level autoscaling, complete with YAML examples, key features, and use cases.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for role-level autoscaling in StormService, which is a great enhancement for fine-grained resource management. The implementation introduces a WorkloadScale interface to abstract scaling logic, which is a solid design choice that improves modularity and maintainability. The changes to the PodAutoscaler API, CRD, and controller logic are well-aligned with the goal. I've found one issue related to an inconsistent implementation in the new WorkloadScale abstraction that should be addressed.

@Jeffwan Jeffwan force-pushed the jiaxin/autoscaler-stormservice-pool-mode branch from 41b71f7 to b13b5f6 Compare October 2, 2025 08:05
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Oct 2, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for role-level autoscaling in StormService, which is a valuable feature. The implementation is well-structured, particularly with the introduction of the WorkloadScale interface that cleanly abstracts the scaling mechanism for different resource types. This makes the controller more modular and extensible. The changes to the API, CRD, and documentation are clear and consistent. I've identified a few potential issues regarding the handling of pod selectors when they are missing, which could lead to incorrect behavior. Addressing these points will enhance the robustness of the autoscaler.

@Jeffwan Jeffwan force-pushed the jiaxin/autoscaler-stormservice-pool-mode branch from b13b5f6 to 9697151 Compare October 2, 2025 16:52
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Oct 2, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for role-level autoscaling for StormService by adding a subTargetSelector to the PodAutoscaler API. The implementation involves a significant and well-designed refactoring of the scaling logic, introducing a WorkloadScale interface to abstract away the details of scaling different resource types. This is a great improvement for maintainability and extensibility. The changes also include necessary updates to CRDs, RBAC, and documentation. My review focuses on further improving the new WorkloadScale abstraction by centralizing logic and removing dead code.

@Jeffwan Jeffwan force-pushed the jiaxin/autoscaler-stormservice-pool-mode branch from 9697151 to a0f6faa Compare October 3, 2025 17:54
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Oct 3, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant enhancement to the PodAutoscaler by adding support for role-level autoscaling within a StormService. This is achieved through a new subTargetSelector field in the API, allowing users to target specific roles like 'prefill' or 'decode' for independent scaling. The implementation correctly avoids the /scale subresource and instead patches the StormService object directly. A new WorkloadScale abstraction has been introduced to cleanly separate the scaling logic for different workload types, which is a great refactoring. The changes are well-supported with new documentation and sample YAML files.

My review focuses on a few key areas:

  • A potential performance regression due to changes in autoscaling window durations.
  • The verbosity of new log messages.
  • A minor issue in the documentation.
  • A small code improvement for logging.

Overall, this is a solid contribution that adds valuable functionality. Addressing the feedback will help ensure the changes are robust and maintainable.

Comment on lines +29 to +30
stableWindowDuration = 180 * time.Second
panicWindowDuration = 60 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The stableWindowDuration and panicWindowDuration have been significantly increased (from 60s to 180s and 6s to 60s, respectively). This will make the autoscaler react much more slowly to metric changes. This is a major behavioral change that isn't mentioned in the pull request description. If this is intentional, please provide a justification in the PR description. If not, it should be reverted to the previous values to avoid unexpected performance degradation.


**Complete example:**

.. literalinclude:: ../../../../samples/autoscaler/stormservice-role-autoscaler.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The literalinclude directive points to stormservice-role-autoscaler.yaml, but this file does not appear to be part of the pull request. Based on the added sample files, it seems you intended to include stormservice-pool.yaml. Please correct the path to ensure the documentation renders correctly.

Suggested change
.. literalinclude:: ../../../../samples/autoscaler/stormservice-role-autoscaler.yaml
.. literalinclude:: ../../../../samples/autoscaling/stormservice-pool.yaml

Comment on lines +201 to +202
klog.InfoS("Metrics window aggregation", "metricKey", metricKeyStr,
"stableAvg", stableValue, "panicAvg", panicValue, "stableWindowValues", stableWindow.Values(), "panicWindowValues", panicWindow.Values())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This log message has been promoted from a verbose level (V(4)) to the default info level. Additionally, it now logs the entire contents of stableWindow.Values() and panicWindow.Values(), which can be very large arrays. This will likely lead to excessive log spam in a production environment, making it difficult to monitor the system and potentially incurring high logging costs. It's recommended to move this log back to a verbose level (e.g., V(4)) to avoid this.

Suggested change
klog.InfoS("Metrics window aggregation", "metricKey", metricKeyStr,
"stableAvg", stableValue, "panicAvg", panicValue, "stableWindowValues", stableWindow.Values(), "panicWindowValues", panicWindow.Values())
klog.V(4).InfoS("Metrics window aggregation", "metricKey", metricKeyStr,
"stableAvg", stableValue, "panicAvg", panicValue, "stableWindowValues", stableWindow.Values(), "panicWindowValues", panicWindow.Values())

if err != nil {
klog.ErrorS(err, "Failed to add ray.io/node-type requirement to label selector")
return nil, fmt.Errorf("failed to add ray requirement: %w", err)
return nil, fmt.Errorf("failed to add ray.io/node-type requirement to label selector: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error log klog.ErrorS was removed here. While the error is still wrapped and returned, logging the error at the point of failure provides valuable context for debugging without needing to trace the error back up the call stack. Please consider re-adding the error log before this line.

klog.ErrorS(err, "Failed to add ray.io/node-type requirement to label selector")
			return nil, fmt.Errorf("failed to add ray.io/node-type requirement to label selector: %w", err)

@Jeffwan Jeffwan force-pushed the jiaxin/autoscaler-stormservice-pool-mode branch from a0f6faa to 5ed2c33 Compare October 7, 2025 17:02
@Jeffwan Jeffwan merged commit da47edd into vllm-project:main Oct 7, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant