Skip to content

Conversation

@googs1025
Copy link
Collaborator

@googs1025 googs1025 commented Oct 13, 2025

Pull Request Description

[Please provide a clear and concise description of your changes here]

~ kubectl get podautoscalers -A
NAMESPACE   NAME                                 MINPODS   MAXPODS   REPLICAS   STRATEGY   AGE
default     deepseek-r1-distill-llama-8b-hpa     1         10                   HPA        13s
default     deepseek-r1-distill-llama-8b-hpa-1   1         10                   HPA        3s
➜  ~ kubectl get podautoscalers -A -oyaml
apiVersion: v1
items:
- apiVersion: autoscaling.aibrix.ai/v1alpha1
  kind: PodAutoscaler
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"autoscaling.aibrix.ai/v1alpha1","kind":"PodAutoscaler","metadata":{"annotations":{},"labels":{"app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"aibrix"},"name":"deepseek-r1-distill-llama-8b-hpa","namespace":"default"},"spec":{"maxReplicas":10,"metricsSources":[{"metricSourceType":"pod","path":"/metrics","port":"8000","protocolType":"http","targetMetric":"gpu_cache_usage_perc","targetValue":"50"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"deepseek-r1-distill-llama-8b"},"scalingStrategy":"HPA"}}
    creationTimestamp: "2025-10-13T13:14:40Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: aibrix
    name: deepseek-r1-distill-llama-8b-hpa
    namespace: default
    resourceVersion: "9526115"
    uid: b6e089bc-322f-408e-b7b1-66a389ffed99
  spec:
    maxReplicas: 10
    metricsSources:
    - metricSourceType: pod
      path: /metrics
      port: "8000"
      protocolType: http
      targetMetric: gpu_cache_usage_perc
      targetValue: "50"
    minReplicas: 1
    scaleTargetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: deepseek-r1-distill-llama-8b
    scalingStrategy: HPA
  status: {}
- apiVersion: autoscaling.aibrix.ai/v1alpha1
  kind: PodAutoscaler
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"autoscaling.aibrix.ai/v1alpha1","kind":"PodAutoscaler","metadata":{"annotations":{},"labels":{"app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"aibrix"},"name":"deepseek-r1-distill-llama-8b-hpa-1","namespace":"default"},"spec":{"maxReplicas":10,"metricsSources":[{"metricSourceType":"pod","path":"/metrics","port":"8000","protocolType":"http","targetMetric":"gpu_cache_usage_perc","targetValue":"50"}],"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"deepseek-r1-distill-llama-8b"},"scalingStrategy":"HPA"}}
    creationTimestamp: "2025-10-13T13:14:50Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: aibrix
    name: deepseek-r1-distill-llama-8b-hpa-1
    namespace: default
    resourceVersion: "9526138"
    uid: 127bd894-59b7-40ca-9dee-53a6a5311e2b
  spec:
    maxReplicas: 10
    metricsSources:
    - metricSourceType: pod
      path: /metrics
      port: "8000"
      protocolType: http
      targetMetric: gpu_cache_usage_perc
      targetValue: "50"
    minReplicas: 1
    scaleTargetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: deepseek-r1-distill-llama-8b
    scalingStrategy: HPA
  status:
    conditions:
    - lastTransitionTime: "2025-10-13T13:14:50Z"
      message: ""
      reason: AsExpected
      status: "True"
      type: ValidSpec
    - lastTransitionTime: "2025-10-13T13:14:50Z"
      message: Scaling target apps/v1.Deployment/default/deepseek-r1-distill-llama-8b
        is already controlled by PodAutoscaler default/deepseek-r1-distill-llama-8b-hpa,
        it will not take effect
      reason: MutilPodAutoscalerConflict
      status: "False"
      type: MutilPodAutoscalerConflict
    - lastTransitionTime: "2025-10-13T13:14:50Z"
      message: desired=0, actual=0
      reason: Stable
      status: "False"
      type: ScalingActive
    - lastTransitionTime: "2025-10-13T13:14:50Z"
      message: ""
      reason: InvalidSpec
      status: "False"
      type: AbleToScale
    - lastTransitionTime: "2025-10-13T13:14:50Z"
      message: ""
      reason: ReconcilingScaleDiff
      status: "False"
      type: Ready
kind: List
metadata:
  resourceVersion: ""

Related Issues

Resolves: ##1661

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @googs1025, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug where the system could allow multiple "PodAutoscaler" resources to attempt to control the same underlying workload, leading to unpredictable and potentially erroneous scaling actions. The changes introduce robust validation to ensure that each scaling target is managed by only one "PodAutoscaler" at a time. This significantly improves the stability and reliability of the autoscaling functionality by preventing conflicting instructions and providing clear status feedback when such conflicts arise.

Highlights

  • Conflict Prevention for PodAutoscalers: Introduced a new validation mechanism to prevent multiple "PodAutoscaler" resources from simultaneously targeting the same workload, ensuring predictable autoscaling behavior.
  • Internal State Management: Implemented internal maps ("scalingTargetToPA", "paToScalingKey") within the "PodAutoscalerReconciler" to track and manage which "PodAutoscaler" controls which scaling target, along with a mutex for thread safety.
  • Enhanced Status Conditions: Added a new "MutilPodAutoscalerConflict" status condition to clearly indicate when a "PodAutoscaler" is in a conflicting state, and updated the "AbleToScale" and "Ready" conditions to reflect this conflict status.
  • Resource Deletion Cleanup: Added logic to automatically clean up the internal conflict tracking state when a "PodAutoscaler" resource is deleted, preventing stale entries.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@googs1025 googs1025 force-pushed the podautoscaler_conflict branch from 902ed28 to 1f782be Compare October 13, 2025 13:17
@googs1025 googs1025 requested a review from Jeffwan October 13, 2025 13:17
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces validation to prevent multiple PodAutoscalers from targeting the same workload, enhancing the stability and predictability of autoscaling behavior. The changes include adding a new condition type, implementing conflict checks, and updating the status computation to reflect potential conflicts. I have identified a critical issue regarding potential race conditions in the conflict checking logic that needs to be addressed.

Comment on lines +611 to +612
specOK := specValidationResult.Valid
noConflict := conflictValidationResult.Valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider combining these two lines into a single line for better readability.

Suggested change
specOK := specValidationResult.Valid
noConflict := conflictValidationResult.Valid
noConflict := conflictValidationResult.Valid
noConflict := conflictValidationResult.Valid

@googs1025 googs1025 force-pushed the podautoscaler_conflict branch from 1f782be to 9bf7258 Compare October 13, 2025 13:23
@Jeffwan
Copy link
Collaborator

Jeffwan commented Oct 13, 2025

For deployment, this is good. For stormservice, we expected multiple hpa rules created for same workloads. https://aibrix.readthedocs.io/latest/features/autoscaling/metric-based-autoscaling.html#stormservice-role-level-autoscaling

in this case, we do not want the validation. is this PR against pool autoscaling pattern?

@Jeffwan Jeffwan force-pushed the podautoscaler_conflict branch from 9bf7258 to 2e231bd Compare October 14, 2025 17:15
@Jeffwan Jeffwan merged commit f9c1a74 into vllm-project:main Oct 14, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants