-
Notifications
You must be signed in to change notification settings - Fork 501
[enhancement] GPU optimizer accumulated fix #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add test case for make url created from metricSource as expected: endpoint should include port, if not and port is specified, port will be append to endpoint.
Fix an error in benchmark that may cause error when now all token_latencies might missing some data.
Optimize update logic.
Reset benchmark as in main.
|
@nwangfw can you help review this PR? |
Signed-off-by: Jingyuan <[email protected]>
Jeffwan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look into the gpu-optimizer detail logics. Since you already run through this version successfully, I think it should be ok. Leave to @nwangfw for further review on that part.
| images: | ||
| - name: aibrix/controller-manager | ||
| newName: aibrix-container-registry-cn-beijing.cr.volces.com/aibrix/controller-manager | ||
| newTag: v0.2.0-rc.1 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: we can use v0.2.0-rc.2 now. few issues fixed.
|
|
||
| resources: | ||
| - ../../../default | ||
| - ../../../gpu-optimizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now, it installed single component instead of the default installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, previously, it installed all + specific debug version. I find it is not convenient to debug more than one components. Now, we just deploy the default, and apply debug components if needed.
* Bug fix * Fix configuration for domain podautoscaler Add test case for make url created from metricSource as expected: endpoint should include port, if not and port is specified, port will be append to endpoint. * Lint fix * Add license for new files. * Lint fix on added unit test. * Add authorization support * Support parameterized benchmark * Remove next_in paramter * Bug fix * Fix typo * Bug fix * Apply stream parameter * Cleaning up responses. * Bug fix * If error not reported as a temporary eror, we will not retry. * GPU profile now support TPAT (time per all token) Fix an error in benchmark that may cause error when now all token_latencies might missing some data. * Debug optimizer * bird prompt dataset generation * update benchmark to support prompt dataset loading * Benchmark now support workload parameter * Bug fix * Log control * Improve stability and lint fix. * Bug fix * switch logs for gpu-optimizer to json format * added BIRD dataset with Aruze timestamp script * add BIRD brust pattern workload generation * Visualizer now support workload file * Print out workload input * Bug fix * lint fix * remove timestamp offset * Bug fix: call _parse_profiles without parameter out_records will not add up returns. * Use current ts to load profile may to early, revert to use an interval ago. * Use the larger of average request rate in window and current request rate to get sufficient resources. * Tuning up request rate temporarily. * Bug fix Fix request rate to 8 temporarily * Remove fixed rate * changing load profile back * Provide compatibility to v3 gateway profiles. * Adjust development config * Add config for gateway-plugin development * delayed scale in deployment added * Add trace to benchmark * rollback to old version without delayed scale in * Disregard pending requests for now. * Bug fix * Bug fix * Adapt to latest profile about pending requests and update unittest. * Output correct timestamp * Output pending and total requests from load reader * Ignore pending for now. * Add throughput filter. * bug and lint fix * Fix a bug that when mat_tputs are 0 * Lint fix * fix benchmark on count num_requests * Optimizer now can adopt deployment changes using "kubectl apply" * Add comments * bug fix * Make signature prefer higher index on choose profiles. * Bug fix, watch ScalingReplicaSet for label changes * Bug fix * Change back SLO preference. Optimize update logic. --------- Signed-off-by: Jingyuan <[email protected]> Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>
* Bug fix * Fix configuration for domain podautoscaler Add test case for make url created from metricSource as expected: endpoint should include port, if not and port is specified, port will be append to endpoint. * Lint fix * Add license for new files. * Lint fix on added unit test. * Add authorization support * Support parameterized benchmark * Remove next_in paramter * Bug fix * Fix typo * Bug fix * Apply stream parameter * Cleaning up responses. * Bug fix * If error not reported as a temporary eror, we will not retry. * GPU profile now support TPAT (time per all token) Fix an error in benchmark that may cause error when now all token_latencies might missing some data. * Debug optimizer * bird prompt dataset generation * update benchmark to support prompt dataset loading * Benchmark now support workload parameter * Bug fix * Log control * Improve stability and lint fix. * Bug fix * switch logs for gpu-optimizer to json format * added BIRD dataset with Aruze timestamp script * add BIRD brust pattern workload generation * Visualizer now support workload file * Print out workload input * Bug fix * lint fix * remove timestamp offset * Bug fix: call _parse_profiles without parameter out_records will not add up returns. * Use current ts to load profile may to early, revert to use an interval ago. * Use the larger of average request rate in window and current request rate to get sufficient resources. * Tuning up request rate temporarily. * Bug fix Fix request rate to 8 temporarily * Remove fixed rate * changing load profile back * Provide compatibility to v3 gateway profiles. * Adjust development config * Add config for gateway-plugin development * delayed scale in deployment added * Add trace to benchmark * rollback to old version without delayed scale in * Disregard pending requests for now. * Bug fix * Bug fix * Adapt to latest profile about pending requests and update unittest. * Output correct timestamp * Output pending and total requests from load reader * Ignore pending for now. * Add throughput filter. * bug and lint fix * Fix a bug that when mat_tputs are 0 * Lint fix * fix benchmark on count num_requests * Optimizer now can adopt deployment changes using "kubectl apply" * Add comments * bug fix * Make signature prefer higher index on choose profiles. * Bug fix, watch ScalingReplicaSet for label changes * Bug fix * Change back SLO preference. Optimize update logic. --------- Signed-off-by: Jingyuan <[email protected]> Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>
Pull Request Description
2.1 For components created using "kubectl create -k config/default", older component must be deleted using "kubectl delete -k config/dev/xxxx" and then reapplied.
Related Issues
Resolves: #546
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]: Corrections to existing functionality[CI]: Changes to build process or CI pipeline[Docs]: Updates or additions to documentation[API]: Modifications to aibrix's API or interface[CLI]: Changes or additions to the Command Line Interface[Misc]: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.