Skip to content

v0.1.0

Compare
Choose a tag to compare
@github-actions github-actions released this 12 Nov 22:33
· 163 commits to main since this release
d885131

Feature Highlights

1. Dynamic LoRa Adapter

The Dynamic LoRa Adapter introduces a flexible approach to model adaptation, allowing dynamic management of LoRa models within Kubernetes. This new functionality includes efficient handling of model registration, unloading, and routing, significantly enhancing operational control and scalability for production environments.

2. Gateway Extension Server with Multi-Algorithm Routing Support

We extend the Envoy Gateway through an extension server and the external processing service can inspect and mutate requests and responses. We use this way to extend some features not directly supported in kubernetes service like various routing algorithms, such as least request, least throughput, and random and rate limit feature. This flexibility allows users to fine-tune routing strategies based on their specific application needs, ultimately improving traffic distribution and system performance.

3. LLM-specific Autoscaler

This release integrates multiple autoscaling algorithms, including HPA (Horizontal Pod Autoscaler), KPA (Knative Pod Autoscaler), and APA (AIBrix Pod Autoscaler). The autoscaling framework now features a direct connection to fetch metrics from pods, enabling real-time adjustments based on load and optimized resource utilization.

4. Unified AI Runtime

The AI runtime has been created to support faster model downloading through GPU streaming way, streamlined metrics aggregation, and efficient LoRa request delegation to abstract underlying engine complexities. This runtime provides an optimized environment for deploying and managing machine learning models, making it easier to handle high-volume requests.

Additional Enhancements:

  • Doc website: Updated documents, including quick-start guides, installation instructions, and tutorials for autoscaling, make setup and onboarding smoother.
  • Benchmarking and Performance Analysis Tools: Integrated tools for benchmarking autoscalers, gateways and lora to monitor and improve system efficiency and performance.
  • CI/CD Workflow: The new CI/CD pipeline includes automated image builds, GitHub Actions for testing and linting, and release pipelines for simplified deployment.

What's Changed

  • Add common project documents and skeleton folders by @Jeffwan in #4
  • Scaffolding aibrix project using kubebuilder by @Jeffwan in #17
  • Optimize project layouts by moving controllers to pkg folder by @Jeffwan in #21
  • Create Lora api and controller by @Jeffwan in #23
  • Rename LoraAdapter to ModelAdapter by @Jeffwan in #25
  • Add ModelAdapter API by @Jeffwan in #26
  • Use better way to set up controller with Manager by @Jeffwan in #27
  • Initial model adapter controller implementation by @Jeffwan in #32
  • Add mocked model container for lora adapter fast prototyping by @Jeffwan in #33
  • [Misc] Add the PR and issues template by @jsw-zorro in #38
  • [Docs] Add example to run vLLM distributed inference using Ray by @Jeffwan in #39
  • [Doc] Improve the model adapter mock service by @Jeffwan in #45
  • [Misc] Simplify the feature/bug/enhancement template. by @jsw-zorro in #48
  • [Misc] Make model adapter controller e2e work by @Jeffwan in #50
  • [Docs] A draft version of the contributing guideline document by @kr11 in #47
  • [Core] Improve model adapter controller by handling existing resources by @Jeffwan in #54
  • [Feat] Initial Implementation of PodAutoscaler Reconciler by @kr11 in #55
  • [Docs] Move the sample mocked application to common folder by @Jeffwan in #64
  • [Misc] Minor refactor the PodAutoscaler codes by @Jeffwan in #68
  • [Core] Add model router controller by @varungup90 in #57
  • Add rbac rules in model router by @varungup90 in #71
  • [bugs] Add autoscaler RBAC to successfully list horizontalpodautoscalers by @kr11 in #72
  • [Misc] Update license info; Add license check by @happyandslow in #73
  • add github workflow to lint & test code by @M00nF1sh in #74
  • [CI] Fix the golang lint issues by @Jeffwan in #77
  • [CI] fix the failures from make test by @Jeffwan in #80
  • [Misc] Add code-generator and openapi-gen as dependencies by @Jeffwan in #59
  • [Misc] Reconcile hpa, kpa and apa separately by @Jeffwan in #83
  • [feat] Add rpm/tpm extension proc plugin by @varungup90 in #79
  • Add kpa scale algorithm implementation by @kr11 in #87
  • Add host override to query specific pod by @varungup90 in #86
  • [Core] init aibrix runtime framework by @brosoul in #88
  • Support kpa/apa autoscaling workflow part I by @Jeffwan in #85
  • Fix Dockerfile Packaging Issues Related to Go Version and Missing Utils by @kr11 in #92
  • Autoscaling Workflow Enhancement - Part 2 by @kr11 in #94
  • Add custom CRD clientset by @varungup90 in #97
  • Autoscaling Workflow Enhancement - Part 3 by @kr11 in #101
  • [Core] Add Downloader implementation for runtime by @brosoul in #96
  • Add RayClusterReplicaSet and RayClusterFleet apis by @Jeffwan in #103
  • Apply crd:maxDescLen=0 in manifest generation by @Jeffwan in #108
  • Apply filter to objects owned by model adapters by @varungup90 in #111
  • Add custom cache and interface for model adapter scheduling by @varungup90 in #100
  • Refactor gateway package by @varungup90 in #112
  • BatchAPI storage component together with test by @xinchen384 in #104
  • Update the installation guidance and README.md by @Jeffwan in #115
  • [CI] Package AI Runtime by @brosoul in #118
  • Add gateway installation by @varungup90 in #122
  • [CI] Support container image build and push in CI by @Jeffwan in #120
  • [CI] Fix nightly image push error by @Jeffwan in #127
  • [Bug] Fix download bugs during download benchmark by @brosoul in #134
  • Autoscaling Workflow Enhancement - Part 4: Integrating MetricClient into Autoscaling Workflow by @kr11 in #116
  • Update make generate by @varungup90 in #132
  • Model adapter controller improvement and refactor by @Jeffwan in #135
  • Improve the aibrix installation scripts by @Jeffwan in #141
  • [CI] Support python package publish by @brosoul in #138
  • Fix some typo and naming issues by @Jeffwan in #150
  • Fix gateway bootstrap issues by @varungup90 in #154
  • Add kubeconfig flag for cache initialization by @varungup90 in #155
  • Using sphinx to generate html pages for our project static site by @xinchen384 in #153
  • Add finalizer and handle the model unload requests by @Jeffwan in #152
  • Fix kubeConfig redefined issue and update imagePullPolicy by @Jeffwan in #158
  • Add expectation lib to allows us to set and wait on expectations by @Jeffwan in #164
  • Add routing algorithms by @varungup90 in #143
  • Add readthedocs configuration for CI builds and update theme by @Jeffwan in #169
  • Add RayClusterReplicaSet initial implementation by @Jeffwan in #165
  • Add template page for the docs by @Jeffwan in #170
  • Remove myst_parser from sphinx extensions by @Jeffwan in #172
  • Update quickstart in the doc by @Jeffwan in #174
  • Metric standardizing in ai runtime by @brosoul in #163
  • [Misc] Rename env in runtime by @brosoul in #176
  • Add readiness check for redis in gateway plugin by @varungup90 in #173
  • [batch] job manager handles job state transition by @xinchen384 in #180
  • Add users CRUD API by @varungup90 in #181
  • Add routing for model adapter by @varungup90 in #183
  • Add installation tests and refactor some CI jobs by @Jeffwan in #188
  • Add release pipeline for images and manifests by @Jeffwan in #189
  • [Docs] Update Readme on project intro by @xieus in #191
  • [CI] Add AI Runtime test case by @brosoul in #197
  • Add AI Runtime exist model check by @brosoul in #198
  • Implement rayclusterfleet controller by @Jeffwan in #194
  • klog Level Standardization by @kr11 in #202
  • Fix RayClusterReplicaSet e2e running issues by @Jeffwan in #200
  • Add lora adapter management API by @brosoul in #201
  • Add kuberay manifest as installation dependencies by @Jeffwan in #203
  • [doc] fix autoscaling readme by @kr11 in #215
  • [doc] update runtime feature doc by @brosoul in #216
  • Fix the annotation missing issue for ray workload by @Jeffwan in #218
  • [CI]: Add python test on different python version by @brosoul in #219
  • Add Autoscaling Tutorials in format of rst by @kr11 in #225
  • [Misc] Check AI Runtime download env settings by @brosoul in #221
  • Cut v0.1.0-rc.2 release by @Jeffwan in #226
  • Add model adapter and multi-node inference docs by @Jeffwan in #222
  • add gateway docs by @varungup90 in #232
  • [Misc] add Runtime dependency for hf_transfer by @brosoul in #240
  • Add validation for username and rpm/tpm negative value by @varungup90 in #241
  • [CI] Merge python wheel publish process to release build pipeline by @brosoul in #247
  • [CI] Push images to Github container registry by @Jeffwan in #246
  • [CI] Fix post-submit container push failure by @Jeffwan in #249
  • [Misc] Infer model name from model_uri and check AWS credential by @brosoul in #250
  • [Misc ]Add runtime api metrics by @brosoul in #251
  • [doc] Update release/contribution/quickstart docs by @Jeffwan in #242
  • [batch] job FIFO scheduler as baseline by @xinchen384 in #231
  • [Misc] Improve the installation component sequence by @Jeffwan in #252
  • Fix concurrency issue with gateway RPM plugin by @varungup90 in #244
  • Improve model adapter reliability and stability by @Jeffwan in #257
  • Remove underscore from dir names and remove account word in rate limiter by @varungup90 in #271
  • [Misc] Use klog as the logr implementation by @Jeffwan in #264
  • [CI] Unify Dockerfile names and simplify the build scripts by @Jeffwan in #263
  • Improve model adapter reconcile workflow stability by @Jeffwan in #260
  • Add container override for images by @varungup90 in #273
  • Add AIBrix Custom Autoscaling Algorithm APA by @kr11 in #223
  • Use vllm metrics for routing by @varungup90 in #274
  • Update random routing section and add support for anonymous user by @varungup90 in #276
  • Add image build details and examples for multi-host inference by @Jeffwan in #278
  • Cut v0.1.0-rc.3 release by @Jeffwan in #280
  • Update manifests version to v0.1.0-rc.3 by @Jeffwan in #287
  • [Misc] Add sync images step and scripts in release process by @Jeffwan in #283
  • [batch] E2E works with driver and request proxy by @xinchen384 in #272
  • Fix address already in use when AIRuntime start in pod by @brosoul in #289
  • Read model name from request body by @varungup90 in #290
  • Fix redis bootstrap flaky connection issue by @varungup90 in #293
  • skip docs CI if no changes in /docs dir by @varungup90 in #294
  • Improve Rayclusterreplicaset Status by @Yicheng-Lu-llll in #295
  • Add request trace for profiling by @varungup90 in #291
  • Update the crd definiton due to runtime upgrade by @Jeffwan in #298
  • Push images to Github registry in release pipeline by @Jeffwan in #301
  • Build autoscaler abstractions like fetcher, client and scaler by @Jeffwan in #300
  • Support pod autoscaler periodically check by @Jeffwan in #306
  • Add timeout in nc check for redis bootstrap by @varungup90 in #309
  • Refactor AutoScaler: metricClient, context, reconcile by @kr11 in #308
  • Cut v0.1.0-rc.4 release by @Jeffwan in #314
  • [doc] update runtime readme by @brosoul in #318
  • Add env for routing strategy override by @varungup90 in #323
  • Fix pod autoscaler enqueue issues by @Jeffwan in #329
  • Autoscaling benchmark by @kr11 in #337
  • Initial lora benchmark result by @Jeffwan in #321
  • Adding plotting script by @happyandslow in #338
  • Update the downloader performance plot by @Jeffwan in #341
  • Reduce pod metrics refresh interval by @varungup90 in #343
  • Enable ipv6 for envoy proxy by @varungup90 in #342
  • Add benchmark scrips for gateway client side changes by @Jeffwan in #340
  • Update the plots based on feedback by @Jeffwan in #346
  • [batch] use volcano TOS as batch storage by @xinchen384 in #344
  • Add check if no pods are present by @varungup90 in #345
  • Add model exists check by @varungup90 in #353
  • [Misc] Disable fastapi docs in runtime default action by @brosoul in #350
  • Add check for acceptable routing strategies by @varungup90 in #352
  • optimize PA messages: const 'HPA' -> actual pa type by @kr11 in #354
  • [Misc] Runtime server startup with args by @brosoul in #355
  • [Misc] Add python format script by @brosoul in #357
  • optimize benchmark scripts for autoscaler, add more logs by @kr11 in #356
  • Update the mocked app to cleaner state by @Jeffwan in #361
  • Update manifests & docs about service httproute naming trick by @Jeffwan in #362
  • Add reference grant to support httprouting for different namespace by @varungup90 in #347
  • Validate routing strategy bug fix by @varungup90 in #364
  • Bug fix for setting routing strategy via env var by @varungup90 in #369
  • Improve the routing env value & flag retrieval by @Jeffwan in #373
  • Sync main branch changes to release-0.1 branch by @Jeffwan in #375
  • Cut v0.1.0-rc.5 release by @Jeffwan in #376
  • Cut v0.1.0-rc.5 release by @Jeffwan in #378
  • [runtime] Add download args for control download progress bar by @brosoul in #382
  • [runtime] Update tos sdk version to 2.8.0 by @brosoul in #381
  • replaced old names AIBricks with AIBrix by @nwangfw in #372
  • [Misc] Update logos, docs and some configuration for v0.1.0 by @Jeffwan in #383
  • Sync changes from main to release-0.1 by @Jeffwan in #384
  • Cut v0.1.0 release by @Jeffwan in #385

New Contributors

Full Changelog: https://github.com/aibrix/aibrix/commits/v0.1.0