Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add GPU Optimizer deployment and update configurations #480

Closed
wants to merge 5 commits into from

Conversation

nwangfw
Copy link
Collaborator

@nwangfw nwangfw commented Dec 4, 2024

Pull Request Description

In https://github.com/aibrix/aibrix/pull/430/files, most of the component are dockerized but have not moved to kubernetes environment, we move them under config/default scope and make sure it can be installed along with other aibrix components.

Related Issues

Resolves: #459

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@nwangfw nwangfw requested review from zhangjyr and Jeffwan December 4, 2024 16:35
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-autoscaler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name should change to something related to gpu-optimizer etc

kind: Role
metadata:
namespace: aibrix-system
name: deployment-reader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, names should be updated

maxReplicas: 10
targetMetric: "avg_prompt_throughput_toks_per_s" # Ignore if metricsSources is configured
metricsSources:
- endpoint: gpu-optimizer.aibrix-system.svc.cluster.local:8080
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files are being refactored in #477. we probably can change it later.

@@ -0,0 +1,75 @@
apiVersion: v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we break these files into more granular files?

  • deployment.yaml
  • rbac.yaml
  • service.yaml

- protocol: TCP
port: 8080
targetPort: 8080
nodePort: 30008
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doe it need nortPort?

containers:
- name: gpu-optimizer
image: aibrix/runtime:nightly
command: ["python", "-m", "aibrix.gpu_optimizer.app"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the server is down, what's the autoscaler behavior? Have you tested such behaviors?

@nwangfw nwangfw force-pushed the gpu-optimizer-orchestration branch from c881f59 to 90cd690 Compare December 4, 2024 21:14
@nwangfw nwangfw linked an issue Dec 5, 2024 that may be closed by this pull request
zhangjyr and others added 3 commits December 5, 2024 11:18
* Move huggingface_token to config.json
Add missing zscaler root CA to image for huggingface lib to download tokenizer model successfully.

* Remove huggingface token

---------

Co-authored-by: Jingyuan Zhang <[email protected]>
* adding timestamp and prompt in/output length to traces

* name fix; plotting script fix

* update README

* addressing comments

* addressing comments

* add sample workload

* add sample workload

* update file format

* update jsonl option

---------

Co-authored-by: Le Xu <[email protected]>
@zhangjyr
Copy link
Collaborator

zhangjyr commented Dec 5, 2024

@nwangfw I fixed k8s access problem on branch issues/484_Controller_failed_to_fetch_metrics_from_MetricSource. I think we should merge changes together.

…_failed_to_fetch_metrics_from_MetricSource

# Conflicts:
#	development/simulator/deployment-a100.yaml
#	development/simulator/deployment-a40.yaml
@zhangjyr
Copy link
Collaborator

zhangjyr commented Dec 6, 2024

Sorry, it looks like this PR includes merged changes from the main. Maybe I should start another PR for a clear view.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Dec 6, 2024

As #494 merged, let's close this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants