-
Notifications
You must be signed in to change notification settings - Fork 501
[API]: feat: add Topology Policy for StormService #1846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @googs1025, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a TopologyPolicy to control pod co-location for StormServices. The changes include adding the new API types, updating CRDs, and implementing the logic to inject pod affinity rules based on the policy.
The implementation is mostly solid, but I've identified a few areas for improvement:
- There is a typo in a new API constant that should be corrected.
- The validation for the new
TopologyPolicycould be strengthened to provide better feedback to users on misconfiguration. - There is some code duplication in the new utility functions that can be refactored for better maintainability.
Overall, this is a good addition. Please see my detailed comments for suggestions.
| // Key is the Kubernetes topology label key to enforce co-location on. | ||
| // Common values include: | ||
| // - "kubernetes.io/hostname" (node-level) | ||
| // - "topology.kubernetes.io/zone" (zone-level) | ||
| // Required when Scope is set. | ||
| // +optional | ||
| Key string `json:"key,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment for Key on line 156 states that it's "Required when Scope is set", but this is not enforced by any validation, and the field is marked as +optional. While the controller logic handles this by ignoring the policy if either Key or Scope is empty, it would be better for user experience to enforce this relationship via validation. Consider adding a validating webhook to enforce that Key is present if Scope is set. Without it, a user might specify a Scope without a Key and the policy would be silently ignored, which can be hard to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add TODO to solve in next pr
2183d34 to
0681d7b
Compare
|
local test: case 1:root@iZ6we78ovpjfbxlipufqliZ:~# kubectl get node -owide --show-labels
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME LABELS
cluster1-control-plane Ready control-plane 22h v1.32.2 172.18.0.3 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-88-generic containerd://2.0.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=cluster1-control-plane,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
cluster1-worker Ready <none> 22h v1.32.2 172.18.0.5 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-88-generic containerd://2.0.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gateway=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=cluster1-worker,kubernetes.io/os=linux,topology.kubernetes.io/zone=zone-a
cluster1-worker2 Ready <none> 22h v1.32.2 172.18.0.4 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-88-generic containerd://2.0.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=cluster1-worker2,kubernetes.io/os=linux,topology.kubernetes.io/zone=zone-a
cluster1-worker3 Ready <none> 22h v1.32.2 172.18.0.2 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-88-generic containerd://2.0.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=cluster1-worker3,kubernetes.io/os=linux,topology.kubernetes.io/zone=zone-b
cluster1-worker4 Ready <none> 22h v1.32.2 172.18.0.6 <none> Debian GNU/Linux 12 (bookworm) 6.8.0-88-generic containerd://2.0.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=cluster1-worker4,kubernetes.io/os=linux,topology.kubernetes.io/zone=zone-b
root@iZ6we78ovpjfbxlipufqliZ:~# vi aa.yaml
root@iZ6we78ovpjfbxlipufqliZ:~# vi aa.yaml
root@iZ6we78ovpjfbxlipufqliZ:~# kubectl apply -f aa.yaml
stormservice.orchestration.aibrix.ai/pd-inference created
root@iZ6we78ovpjfbxlipufqliZ:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pd-inference-roleset-7zrrb-decode-6794f4-0 1/1 Running 0 8s 10.6.2.48 cluster1-worker <none> <none>
pd-inference-roleset-7zrrb-decode-6794f4-1 1/1 Running 0 8s 10.6.4.84 cluster1-worker2 <none> <none>
pd-inference-roleset-7zrrb-decode-6794f4-2 1/1 Running 0 8s 10.6.4.85 cluster1-worker2 <none> <none>
pd-inference-roleset-7zrrb-decode-6794f4-3 1/1 Running 0 8s 10.6.2.47 cluster1-worker <none> <none>
pd-inference-roleset-7zrrb-prefill-648798-0-0 1/1 Running 0 8s 10.6.3.52 cluster1-worker3 <none> <none>
pd-inference-roleset-7zrrb-prefill-648798-0-1 1/1 Running 0 7s 10.6.1.29 cluster1-worker4 <none> <none>
pd-inference-roleset-7zrrb-prefill-648798-1-0 1/1 Running 0 7s 10.6.3.53 cluster1-worker3 <none> <none>
pd-inference-roleset-7zrrb-prefill-648798-1-1 1/1 Running 0 7s 10.6.1.30 cluster1-worker4 <none> <none>
pd-inference-roleset-fjf5b-decode-6794f4-0 1/1 Running 0 8s 10.6.2.49 cluster1-worker <none>
... apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
name: pd-inference
spec:
replicas: 3
stateful: true
selector:
matchLabels:
app: pd-inference
template:
metadata:
labels:
app: pd-inference
spec:
topologyPolicy:
scope: Role
key: topology.kubernetes.io/zone
roles:
- name: prefill
replicas: 2
podGroupSize: 2
stateful: true
template:
metadata:
labels:
role: prefill
spec:
containers:
- name: prefill
image: nginx:alpine
- name: decode
replicas: 4
stateful: true
template:
metadata:
labels:
role: decode
spec:
containers:
- name: decode
image: nginx:alpinecase 2:root@iZ6we78ovpjfbxlipufqliZ:~# cat aa.yaml
apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
name: pd-inference
spec:
replicas: 3
stateful: true
selector:
matchLabels:
app: pd-inference
template:
metadata:
labels:
app: pd-inference
spec:
topologyPolicy:
scope: Role
key: topology.kubernetes.io/zone
roles:
- name: prefill
replicas: 2
podGroupSize: 2
stateful: true
template:
metadata:
labels:
role: prefill
spec:
containers:
- name: prefill
image: nginx:alpine
- name: decode
replicas: 4
stateful: true
template:
metadata:
labels:
role: decode
spec:
containers:
- name: decode
image: nginx:alpine
root@iZ6we78ovpjfbxlipufqliZ:~#
root@iZ6we78ovpjfbxlipufqliZ:~# kubectl apply -f aa.yaml
stormservice.orchestration.aibrix.ai/pd-inference created
root@iZ6we78ovpjfbxlipufqliZ:~# kubectl get pods -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pd-inference-roleset-8qw9t-decode-6794f4-0 1/1 Running 0 8s 10.6.3.56 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-decode-6794f4-1 1/1 Running 0 8s 10.6.3.59 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-decode-6794f4-2 1/1 Running 0 8s 10.6.3.57 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-decode-6794f4-3 1/1 Running 0 8s 10.6.3.58 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-prefill-648798-0-0 1/1 Running 0 8s 10.6.3.63 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-prefill-648798-0-1 1/1 Running 0 8s 10.6.3.60 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-prefill-648798-1-0 1/1 Running 0 8s 10.6.3.61 cluster1-worker3 <none> <none>
pd-inference-roleset-8qw9t-prefill-648798-1-1 1/1 Running 0 8s 10.6.3.62 cluster1-worker3 <none> <none>
pd-inference-roleset-bgqhg-decode-6794f4-0 1/1 Running 0 9s 10.6.2.54 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-decode-6794f4-1 1/1 Running 0 9s 10.6.2.51 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-decode-6794f4-2 1/1 Running 0 9s 10.6.2.53 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-decode-6794f4-3 1/1 Running 0 9s 10.6.2.52 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-prefill-648798-0-0 1/1 Running 0 9s 10.6.2.55 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-prefill-648798-0-1 1/1 Running 0 9s 10.6.2.56 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-prefill-648798-1-0 1/1 Running 0 8s 10.6.2.57 cluster1-worker <none> <none>
pd-inference-roleset-bgqhg-prefill-648798-1-1 1/1 Running 0 8s 10.6.2.58 cluster1-worker <none> <none>
pd-inference-roleset-dfwfs-decode-6794f4-0 1/1 Running 0 8s 10.6.1.36 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-decode-6794f4-1 1/1 Running 0 8s 10.6.1.34 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-decode-6794f4-2 1/1 Running 0 8s 10.6.1.33 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-decode-6794f4-3 1/1 Running 0 8s 10.6.1.35 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-prefill-648798-0-0 1/1 Running 0 8s 10.6.1.37 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-prefill-648798-0-1 1/1 Running 0 8s 10.6.1.39 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-prefill-648798-1-0 1/1 Running 0 8s 10.6.1.38 cluster1-worker4 <none> <none>
pd-inference-roleset-dfwfs-prefill-648798-1-1 1/1 Running 0 8s 10.6.1.40 cluster1-worker4 <none> <none>
|
Signed-off-by: CYJiang <[email protected]>
0681d7b to
fdd1d4e
Compare
Pull Request Description
[Please provide a clear and concise description of your changes here]
Related Issues
Resolves: #1842
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]: Corrections to existing functionality[CI]: Changes to build process or CI pipeline[Docs]: Updates or additions to documentation[API]: Modifications to aibrix's API or interface[CLI]: Changes or additions to the Command Line Interface[Misc]: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.