-
Notifications
You must be signed in to change notification settings - Fork 182
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Bug] GPU optimizer bug fix and document fix (#656)
* Bug fix * Fix configuration for domain podautoscaler Add test case for make url created from metricSource as expected: endpoint should include port, if not and port is specified, port will be append to endpoint. * Lint fix * Add license for new files. * Lint fix on added unit test. * Add authorization support * Support parameterized benchmark * Remove next_in paramter * Bug fix * Fix typo * Bug fix * Apply stream parameter * Cleaning up responses. * Bug fix * If error not reported as a temporary eror, we will not retry. * GPU profile now support TPAT (time per all token) Fix an error in benchmark that may cause error when now all token_latencies might missing some data. * Debug optimizer * bird prompt dataset generation * update benchmark to support prompt dataset loading * Benchmark now support workload parameter * Bug fix * Log control * Improve stability and lint fix. * Bug fix * switch logs for gpu-optimizer to json format * added BIRD dataset with Aruze timestamp script * add BIRD brust pattern workload generation * Visualizer now support workload file * Print out workload input * Bug fix * lint fix * remove timestamp offset * Bug fix: call _parse_profiles without parameter out_records will not add up returns. * Use current ts to load profile may to early, revert to use an interval ago. * Use the larger of average request rate in window and current request rate to get sufficient resources. * Tuning up request rate temporarily. * Bug fix Fix request rate to 8 temporarily * Remove fixed rate * changing load profile back * Provide compatibility to v3 gateway profiles. * Adjust development config * Add config for gateway-plugin development * delayed scale in deployment added * Add trace to benchmark * rollback to old version without delayed scale in * Disregard pending requests for now. * Bug fix * Bug fix * Adapt to latest profile about pending requests and update unittest. * Output correct timestamp * Output pending and total requests from load reader * Ignore pending for now. * Add throughput filter. * bug and lint fix * Fix a bug that when mat_tputs are 0 * Lint fix * fix benchmark on count num_requests * Optimizer now can adopt deployment changes using "kubectl apply" * Add comments * bug fix * Make signature prefer higher index on choose profiles. * Bug fix, watch ScalingReplicaSet for label changes * Bug fix * Change back SLO preference. Optimize update logic. * Refine gpu optimizer document and apply more generic default parameters. * Update document to use production vllm configuration example Fix benchmark and gen_profile to work inside python module. * Add samples/heterogenous * Clean up * Modify load reader to support latest workload Fix a potential bug that in corner cases, out of profile patterns are maps to closest profiled patterns and causes possible data loss. * Fix doc and example * Use 100 instead 1 as scale fraction. * remove unnecessary samples * Lint fix --------- Signed-off-by: Jingyuan <[email protected]> Co-authored-by: Jingyuan Zhang <[email protected]> Co-authored-by: Ning Wang <[email protected]>
- Loading branch information
1 parent
766d8a8
commit bbb148c
Showing
10 changed files
with
69 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
kind: Kustomization | ||
|
||
resources: | ||
- deepseek-coder-7b-service.yaml | ||
- deepseek-coder-7b-l20-deployment.yaml | ||
- deepseek-coder-7b-l20-podautoscaler.yaml | ||
- deepseek-coder-7b-v100-deployment.yaml | ||
- deepseek-coder-7b-v100-podautoscaler.yaml | ||
|
||
patches: | ||
- patch: |- # Use the '|' and '-' for inline patching, warm up 10 hosts and start with 7 | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: deepseek-coder-7b-v100 | ||
labels: | ||
model.aibrix.ai/min_replicas: "1" | ||
target: | ||
kind: Deployment | ||
name: deepseek-coder-7b-v100 | ||
- patch: |- # Use the '|' and '-' for inline patching, warm up 10 hosts and start with 7 | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: deepseek-coder-7b-l20 | ||
labels: | ||
model.aibrix.ai/min_replicas: "0" | ||
target: | ||
kind: Deployment | ||
name: deepseek-coder-7b-l20 | ||
|
||
apiVersion: kustomize.config.k8s.io/v1beta1 |