-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model API #299
base: main
Are you sure you want to change the base?
Add model API #299
Conversation
I will finish the code review by tomorrow. A little bit busy on some miscs |
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized. | ||
|
||
// ModelSpec defines the desired state of Model | ||
type ModelSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is very thin layer. If we use PodTemplate, do we still have control on the sidecar and engine?
- should we inject initContainer or sidecar container inside the user defined template?
- What if user uses
SGLang
in pod template but specify the vLLM as the engine type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If we have requirements then we should definitely inject our own init or sidecar containers.
- Good point, need to add checks to prevent such disprepancy.
Overall code change looks good to me. However, we need some discussion on the model API abstraction. We should come up enough future features and take those insights into the consideration when we build this API |
Sure. I am thinking pretty much all features are associated with base model, such as reroute (to another deployment if current base model is unavailable), retry, or default behavior for rpm/tpm, routing algorithm can be defined for base model and user can override as required. Right now with pod template, it is pretty thin layer with basic checks and can be extended as requirement evolves. |
Address #302