Release v0.1.0 · vllm-project/aibrix

Feature Highlights

1. Dynamic LoRa Adapter

The Dynamic LoRa Adapter introduces a flexible approach to model adaptation, allowing dynamic management of LoRa models within Kubernetes. This new functionality includes efficient handling of model registration, unloading, and routing, significantly enhancing operational control and scalability for production environments.

2. Gateway Extension Server with Multi-Algorithm Routing Support

We extend the Envoy Gateway through an extension server and the external processing service can inspect and mutate requests and responses. We use this way to extend some features not directly supported in kubernetes service like various routing algorithms, such as least request, least throughput, and random and rate limit feature. This flexibility allows users to fine-tune routing strategies based on their specific application needs, ultimately improving traffic distribution and system performance.

3. LLM-specific Autoscaler

This release integrates multiple autoscaling algorithms, including HPA (Horizontal Pod Autoscaler), KPA (Knative Pod Autoscaler), and APA (AIBrix Pod Autoscaler). The autoscaling framework now features a direct connection to fetch metrics from pods, enabling real-time adjustments based on load and optimized resource utilization.

4. Unified AI Runtime

The AI runtime has been created to support faster model downloading through GPU streaming way, streamlined metrics aggregation, and efficient LoRa request delegation to abstract underlying engine complexities. This runtime provides an optimized environment for deploying and managing machine learning models, making it easier to handle high-volume requests.

Additional Enhancements:

Doc website: Updated documents, including quick-start guides, installation instructions, and tutorials for autoscaling, make setup and onboarding smoother.
Benchmarking and Performance Analysis Tools: Integrated tools for benchmarking autoscalers, gateways and lora to monitor and improve system efficiency and performance.
CI/CD Workflow: The new CI/CD pipeline includes automated image builds, GitHub Actions for testing and linting, and release pipelines for simplified deployment.

What's Changed

Add common project documents and skeleton folders by @Jeffwan in #4
Scaffolding aibrix project using kubebuilder by @Jeffwan in #17
Optimize project layouts by moving controllers to pkg folder by @Jeffwan in #21
Create Lora api and controller by @Jeffwan in #23
Rename LoraAdapter to ModelAdapter by @Jeffwan in #25
Add ModelAdapter API by @Jeffwan in #26
Use better way to set up controller with Manager by @Jeffwan in #27
Initial model adapter controller implementation by @Jeffwan in #32
Add mocked model container for lora adapter fast prototyping by @Jeffwan in #33
[Misc] Add the PR and issues template by @jsw-zorro in #38
[Docs] Add example to run vLLM distributed inference using Ray by @Jeffwan in #39
[Doc] Improve the model adapter mock service by @Jeffwan in #45
[Misc] Simplify the feature/bug/enhancement template. by @jsw-zorro in #48
[Misc] Make model adapter controller e2e work by @Jeffwan in #50
[Docs] A draft version of the contributing guideline document by @kr11 in #47
[Core] Improve model adapter controller by handling existing resources by @Jeffwan in #54
[Feat] Initial Implementation of PodAutoscaler Reconciler by @kr11 in #55
[Docs] Move the sample mocked application to common folder by @Jeffwan in #64
[Misc] Minor refactor the PodAutoscaler codes by @Jeffwan in #68
[Core] Add model router controller by @varungup90 in #57
Add rbac rules in model router by @varungup90 in #71
[bugs] Add autoscaler RBAC to successfully list horizontalpodautoscalers by @kr11 in #72
[Misc] Update license info; Add license check by @happyandslow in #73
add github workflow to lint & test code by @M00nF1sh in #74
[CI] Fix the golang lint issues by @Jeffwan in #77
[CI] fix the failures from make test by @Jeffwan in #80
[Misc] Add code-generator and openapi-gen as dependencies by @Jeffwan in #59
[Misc] Reconcile hpa, kpa and apa separately by @Jeffwan in #83
[feat] Add rpm/tpm extension proc plugin by @varungup90 in #79
Add kpa scale algorithm implementation by @kr11 in #87
Add host override to query specific pod by @varungup90 in #86
[Core] init aibrix runtime framework by @brosoul in #88
Support kpa/apa autoscaling workflow part I by @Jeffwan in #85
Fix Dockerfile Packaging Issues Related to Go Version and Missing Utils by @kr11 in #92
Autoscaling Workflow Enhancement - Part 2 by @kr11 in #94
Add custom CRD clientset by @varungup90 in #97
Autoscaling Workflow Enhancement - Part 3 by @kr11 in #101
[Core] Add Downloader implementation for runtime by @brosoul in #96
Add RayClusterReplicaSet and RayClusterFleet apis by @Jeffwan in #103
Apply crd:maxDescLen=0 in manifest generation by @Jeffwan in #108
Apply filter to objects owned by model adapters by @varungup90 in #111
Add custom cache and interface for model adapter scheduling by @varungup90 in #100
Refactor gateway package by @varungup90 in #112
BatchAPI storage component together with test by @xinchen384 in #104
Update the installation guidance and README.md by @Jeffwan in #115
[CI] Package AI Runtime by @brosoul in #118
Add gateway installation by @varungup90 in #122
[CI] Support container image build and push in CI by @Jeffwan in #120
[CI] Fix nightly image push error by @Jeffwan in #127
[Bug] Fix download bugs during download benchmark by @brosoul in #134
Autoscaling Workflow Enhancement - Part 4: Integrating MetricClient into Autoscaling Workflow by @kr11 in #116
Update make generate by @varungup90 in #132
Model adapter controller improvement and refactor by @Jeffwan in #135
Improve the aibrix installation scripts by @Jeffwan in #141
[CI] Support python package publish by @brosoul in #138
Fix some typo and naming issues by @Jeffwan in #150
Fix gateway bootstrap issues by @varungup90 in #154
Add kubeconfig flag for cache initialization by @varungup90 in #155
Using sphinx to generate html pages for our project static site by @xinchen384 in #153
Add finalizer and handle the model unload requests by @Jeffwan in #152
Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
Add routing algorithms by @varungup90 in #143
Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
Add template page for the docs by @Jeffwan in #170
Remove myst_parser from sphinx extensions by @Jeffwan in #172
Update quickstart in the doc by @Jeffwan in #174
Metric standardizing in ai runtime by @brosoul in #163
[Misc] Rename env in runtime by @brosoul in #176
Add readiness check for redis in gateway plugin by @varungup90 in #173
[batch] job manager handles job state transition by @xinchen384 in #180
Add users CRUD API by @varungup90 in #181
Add routing for model adapter by @varungup90 in #183
Add installation tests and refactor some CI jobs by @Jeffwan in #188
Add release pipeline for images and manifests by @Jeffwan in #189
[Docs] Update Readme on project intro by @xieus in #191
[CI] Add AI Runtime test case by @brosoul in #197
Add AI Runtime exist model check by @brosoul in #198
Implement rayclusterfleet controller by @Jeffwan in #194
klog Level Standardization by @kr11 in #202
Fix RayClusterReplicaSet e2e running issues by @Jeffwan in #200
Add lora adapter management API by @brosoul in #201
Add kuberay manifest as installation dependencies by @Jeffwan in #203
[doc] fix autoscaling readme by @kr11 in #215
[doc] update runtime feature doc by @brosoul in #216
Fix the annotation missing issue for ray workload by @Jeffwan in #218
[CI]: Add python test on different python version by @brosoul in #219
Add Autoscaling Tutorials in format of rst by @kr11 in #225
[Misc] Check AI Runtime download env settings by @brosoul in #221
Cut v0.1.0-rc.2 release by @Jeffwan in #226
Add model adapter and multi-node inference docs by @Jeffwan in #222
add gateway docs by @varungup90 in #232
[Misc] add Runtime dependency for hf_transfer by @brosoul in #240
Add validation for username and rpm/tpm negative value by @varungup90 in #241
[CI] Merge python wheel publish process to release build pipeline by @brosoul in #247
[CI] Push images to Github container registry by @Jeffwan in #246
[CI] Fix post-submit container push failure by @Jeffwan in #249
[Misc] Infer model name from model_uri and check AWS credential by @brosoul in #250
[Misc ]Add runtime api metrics by @brosoul in #251
[doc] Update release/contribution/quickstart docs by @Jeffwan in #242
[batch] job FIFO scheduler as baseline by @xinchen384 in #231
[Misc] Improve the installation component sequence by @Jeffwan in #252
Fix concurrency issue with gateway RPM plugin by @varungup90 in #244
Improve model adapter reliability and stability by @Jeffwan in #257
Remove underscore from dir names and remove account word in rate limiter by @varungup90 in #271
[Misc] Use klog as the logr implementation by @Jeffwan in #264
[CI] Unify Dockerfile names and simplify the build scripts by @Jeffwan in #263
Improve model adapter reconcile workflow stability by @Jeffwan in #260
Add container override for images by @varungup90 in #273
Add AIBrix Custom Autoscaling Algorithm APA by @kr11 in #223
Use vllm metrics for routing by @varungup90 in #274
Update random routing section and add support for anonymous user by @varungup90 in #276
Add image build details and examples for multi-host inference by @Jeffwan in #278
Cut v0.1.0-rc.3 release by @Jeffwan in #280
Update manifests version to v0.1.0-rc.3 by @Jeffwan in #287
[Misc] Add sync images step and scripts in release process by @Jeffwan in #283
[batch] E2E works with driver and request proxy by @xinchen384 in #272
Fix address already in use when AIRuntime start in pod by @brosoul in #289
Read model name from request body by @varungup90 in #290
Fix redis bootstrap flaky connection issue by @varungup90 in #293
skip docs CI if no changes in /docs dir by @varungup90 in #294
Improve Rayclusterreplicaset Status by @Yicheng-Lu-llll in #295
Add request trace for profiling by @varungup90 in #291
Update the crd definiton due to runtime upgrade by @Jeffwan in #298
Push images to Github registry in release pipeline by @Jeffwan in #301
Build autoscaler abstractions like fetcher, client and scaler by @Jeffwan in #300
Support pod autoscaler periodically check by @Jeffwan in #306
Add timeout in nc check for redis bootstrap by @varungup90 in #309
Refactor AutoScaler: metricClient, context, reconcile by @kr11 in #308
Cut v0.1.0-rc.4 release by @Jeffwan in #314
[doc] update runtime readme by @brosoul in #318
Add env for routing strategy override by @varungup90 in #323
Fix pod autoscaler enqueue issues by @Jeffwan in #329
Autoscaling benchmark by @kr11 in #337
Initial lora benchmark result by @Jeffwan in #321
Adding plotting script by @happyandslow in #338
Update the downloader performance plot by @Jeffwan in #341
Reduce pod metrics refresh interval by @varungup90 in #343
Enable ipv6 for envoy proxy by @varungup90 in #342
Add benchmark scrips for gateway client side changes by @Jeffwan in #340
Update the plots based on feedback by @Jeffwan in #346
[batch] use volcano TOS as batch storage by @xinchen384 in #344
Add check if no pods are present by @varungup90 in #345
Add model exists check by @varungup90 in #353
[Misc] Disable fastapi docs in runtime default action by @brosoul in #350
Add check for acceptable routing strategies by @varungup90 in #352
optimize PA messages: const 'HPA' -> actual pa type by @kr11 in #354
[Misc] Runtime server startup with args by @brosoul in #355
[Misc] Add python format script by @brosoul in #357
optimize benchmark scripts for autoscaler, add more logs by @kr11 in #356
Update the mocked app to cleaner state by @Jeffwan in #361
Update manifests & docs about service httproute naming trick by @Jeffwan in #362
Add reference grant to support httprouting for different namespace by @varungup90 in #347
Validate routing strategy bug fix by @varungup90 in #364
Bug fix for setting routing strategy via env var by @varungup90 in #369
Improve the routing env value & flag retrieval by @Jeffwan in #373
Sync main branch changes to release-0.1 branch by @Jeffwan in #375
Cut v0.1.0-rc.5 release by @Jeffwan in #376
Cut v0.1.0-rc.5 release by @Jeffwan in #378
[runtime] Add download args for control download progress bar by @brosoul in #382
[runtime] Update tos sdk version to 2.8.0 by @brosoul in #381
replaced old names AIBricks with AIBrix by @nwangfw in #372
[Misc] Update logos, docs and some configuration for v0.1.0 by @Jeffwan in #383
Sync changes from main to release-0.1 by @Jeffwan in #384
Cut v0.1.0 release by @Jeffwan in #385

New Contributors

@Jeffwan made their first contribution in #4
@jsw-zorro made their first contribution in #38
@kr11 made their first contribution in #47
@varungup90 made their first contribution in #57
@happyandslow made their first contribution in #73
@M00nF1sh made their first contribution in #74
@brosoul made their first contribution in #88
@xinchen384 made their first contribution in #104
@xieus made their first contribution in #191
@Yicheng-Lu-llll made their first contribution in #295
@nwangfw made their first contribution in #372

Full Changelog: https://github.com/aibrix/aibrix/commits/v0.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0