kubernetes-sigs · robscott · Dec 21, 2024 · Dec 30, 2024 · Dec 31, 2024 · danehans
diff --git a/Makefile b/Makefile
@@ -162,6 +162,14 @@ live-docs:
 	docker build -t gaie/mkdocs hack/mkdocs/image
 	docker run --rm -it -p 3000:3000 -v ${PWD}:/docs gaie/mkdocs
 
+.PHONY: api-ref-docs
+api-ref-docs:
+	crd-ref-docs \
+		--source-path=${PWD}/api \
+		--config=crd-ref-docs.yaml \
+		--renderer=markdown \
+		--output-path=${PWD}/site-src/reference/spec.md
+
 ##@ Deployment
 
 ifndef ignore-not-found

diff --git a/crd-ref-docs.yaml b/crd-ref-docs.yaml
@@ -0,0 +1,10 @@
+processor:
+  ignoreTypes:
+    - "(InferencePool|InferenceModel)List$"
+  # RE2 regular expressions describing type fields that should be excluded from the generated documentation.
+  ignoreFields:
+    - "TypeMeta$"
+
+render:
+  # Version of Kubernetes to use when generating links to Kubernetes API documentation.
+  kubernetesVersion: 1.31
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -29,6 +29,7 @@ plugins:
   - mermaid2
 markdown_extensions:
   - admonition
+  - markdown.extensions.nl2br
   - meta
   - pymdownx.emoji:
       emoji_index: !!python/name:material.extensions.emoji.twemoji
@@ -52,19 +53,17 @@ nav:
         API Overview: concepts/api-overview.md
         Conformance: concepts/conformance.md
         Roles and Personas: concepts/roles-and-personas.md
-        Use Cases: concepts/use-cases.md
     - Implementations: implementations.md
     - FAQ: faq.md
-    - Glossary: concepts/glossary.md
   - Guides:
     - User Guides:
       - Getting started: guides/index.md
     - Implementer's Guide: guides/implementers.md
   - Reference:
+    - API Reference: reference/spec.md
     - API Types:
       - InferencePool: api-types/inferencepool.md
       - InferenceModel: api-types/inferencemodel.md
-    - API specification: reference/spec.md
   - Enhancements:
     - Overview: gieps/overview.md
   - Contributing:

diff --git a/site-src/api-types/inferencepool.md b/site-src/api-types/inferencepool.md
@@ -7,7 +7,11 @@
 
 ## Background
 
-TODO
+InferencePool is
+
+<!-- Source: https://docs.google.com/presentation/d/11HEYCgFi-aya7FS91JvAfllHiIlvfgcp7qpi_Azjk4E/edit#slide=id.g292839eca6d_1_0 -->
+<img src="/images/inferencepool-vs-service.png" alt="Comparing InferencePool with Service" class="center" width="550" />
+
 
 ## Spec
 

diff --git a/site-src/concepts/conformance.md b/site-src/concepts/conformance.md
@@ -0,0 +1,31 @@
+# Conformance
+
+Similar to Gateway API, this project will rely on conformance tests to ensure
+compatibility across implementations. This will be focused on three different
+layers:
+
+## 1. Gateway API Implementations
+
+Conformance tests will verify that:
+
+* InferencePool is supported as a backend type
+* Implementations forward requests to the configured extension for an
+  InferencePool using the protocol specified by this project
+* Implementations honor the routing guidance provided by the extension
+* Implementations behave appropriately when an extension is either not present
+  or fails to respond
+
+## 2. Inference Routing Extensions
+
+Conformance tests will verify that:
+
+* Extensions accept requests that match the protocol specified by this project
+* Extensions respond with routing guidance that matches the protocol specified
+  by this project
+
+## 3. Model Server Frameworks
+
+Conformance tests will verify that:
+
+* Frameworks serve the expected set of metrics using a format and path specified
+  by this project
diff --git a/site-src/concepts/glossary.md b/site-src/concepts/glossary.md
diff --git a/site-src/concepts/use-cases.md b/site-src/concepts/use-cases.md
diff --git a/site-src/contributing/index.md b/site-src/contributing/index.md
@@ -1,3 +1,49 @@
 # How to Get Involved
 
-TODO
+This page contains links to all of the meeting notes, design docs and related
+discussions around the APIs.
+
+## Bug Reports
+
+Bug reports should be filed as [GitHub Issues](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/new) on this repo.
+
+**NOTE**: If you're reporting a bug that applies to a specific implementation of
+this project, please check our
+[implementations page](/implementations) to find links to the repositories where
+you can get help with your specific implementation.
+
+## Communications
+
+Major discussions and notifications will be sent on both the
+[WG-Serving](https://groups.google.com/a/kubernetes.io/g/wg-serving) and
+[SIG-Network](https://groups.google.com/forum/#!forum/kubernetes-sig-network)
+mailing lists.
+
+Although we may end up creating a new Slack channel in the future, our
+conversations are currently split between the following Kubernetes Slack
+channels:
+
+* [#sig-network-gateway-api](https://kubernetes.slack.com/archives/CR0H13KGA)
+* [#wg-serving](https://kubernetes.slack.com/archives/C071WA7R9LY)
+
+## Meetings
+
+Gateway API community meetings happen every Thursday at 10am Pacific Time
+([convert to your
+timezone](https://dateful.com/time-zone-converter?t=08:00&tz=PT%20%28Pacific%20Time%29)).
+To receive an invite to this and other WG-Serving community meetings, join the
+[WG-Serving mailing
+list](https://groups.google.com/a/kubernetes.io/g/wg-serving).
+
+* [Zoom link](https://zoom.us/j/9955436256?pwd=Z2FQWU1jeDZkVC9RRTN4TlZyZTBHZz09) (passcode in [meeting notes](https://docs.google.com/document/d/1frfPE5L1sI3737rdQV04IcDGeOcGJj2ItjMg6z2SRH0/edit?tab=t.0#heading=h.jvz2pwvdpit0) doc)
+
+### Meeting Notes and Recordings
+
+Meeting agendas and notes are maintained in the [meeting
+notes](https://docs.google.com/document/d/1frfPE5L1sI3737rdQV04IcDGeOcGJj2ItjMg6z2SRH0/edit?tab=t.0#heading=h.jvz2pwvdpit0)
+doc. Feel free to add topics for discussion at an upcoming meeting.
+
+All meetings are recorded and automatically uploaded to the [WG-Serving meetings
+YouTube
+playlist][https://www.youtube.com/playlist?list=PL69nYSiGNLP30qNanabU75ayPK7OPNAAS].
+
diff --git a/site-src/faq.md b/site-src/faq.md
@@ -1,3 +1,30 @@
 # Frequently Asked Questions (FAQ)
 
-TODO
+## How can I get involved with this project?
+The [contributing](/contributing) page keeps track of how to get involved with
+the project.
+
+## Why isn't this project in the main Gateway API repo?
+This project is an extension of Gateway API, and may eventually be merged into
+the main Gateway API repo. As we're starting, this project represents a close
+collaboration between
+[WG-Serving](https://github.com/kubernetes/community/tree/master/wg-serving),
+[SIG-Network](https://github.com/kubernetes/community/tree/master/sig-network),
+and the [Gateway API](https://gateway-api.sigs.k8s.io/) subproject. These groups
+are all well represented within the ownership of this project, and the separate
+repo enables this group to iterate more quickly as this project is getting
+started. As the project stabilizes, we'll revisit if it should become part of
+the main Gateway API project.
+
+## Will there be a default controller implementation?
+No. Although this project will provide a default/reference implementation of an
+extension, each individual Gateway controller can support this pattern. The
+scope of this project is to define the API extension model, a reference
+extension, conformance tests, and overall documentation.
+
+## Can you add support for my use case to the reference extension?
+Maybe. We're trying to keep the scope of the reference extension fairly narrow
+and instead hoping to see an ecosystem of compatible extensions developed in
+this space. Unless a use case fits neatly into the existing scope of our
+reference extension, it would likely be better to develop a separate extension
+focused on your use case.
diff --git a/site-src/images/inferencepool-vs-service.png b/site-src/images/inferencepool-vs-service.png
diff --git a/site-src/images/request-flow.png b/site-src/images/request-flow.png
diff --git a/site-src/images/resource-model.png b/site-src/images/resource-model.png
diff --git a/site-src/implementations.md b/site-src/implementations.md
@@ -1,3 +1,56 @@
 # Implementations
 
-TODO
+This project has several implementations that are planned or in progress:
+
+* [Envoy Gateway][1]
+* [Gloo k8sgateway][2]
+* [Google Kubernetes Engine][3]
+
+[1]:#envoy-gateway
+[2]:#gloo-k8sgateway
+[3]:#google-kubernetes-engine
+
+## Envoy Gateway
+[Envoy Gateway][eg-home] is an [Envoy][envoy-org] subproject for managing
+Envoy-based application gateways. The supported APIs and fields of the Gateway
+API are outlined [here][eg-supported]. Use the [quickstart][eg-quickstart] to
+get Envoy Gateway running with Gateway API in a few simple steps.
+
+Progress towards supporting this project is tracked with a [GitHub
+Issue](https://github.com/envoyproxy/gateway/issues/4423).
+
+[eg-home]:https://gateway.envoyproxy.io/
+[envoy-org]:https://github.com/envoyproxy
+[eg-supported]:https://gateway.envoyproxy.io/docs/tasks/quickstart/
+[eg-quickstart]:https://gateway.envoyproxy.io/docs/tasks/quickstart
+
+## Gloo k8sgateway
+
+[Gloo k8sgateway](https://k8sgateway.io/) is a feature-rich, Kubernetes-native
+ingress controller and next-generation API gateway. Gloo k8sgateway brings the
+full power and community support of Gateway API to its existing control-plane
+implementation.
+
+Progress towards supporting this project is tracked with a [GitHub
+Issue](https://github.com/k8sgateway/k8sgateway/issues/10411).
+
+## Google Kubernetes Engine
+
+[Google Kubernetes Engine (GKE)][gke] is a managed Kubernetes platform offered
+by Google Cloud. GKE's implementation of the Gateway API is through the [GKE
+Gateway controller][gke-gateway] which provisions Google Cloud Load Balancers
+for Pods in GKE clusters.
+
+The GKE Gateway controller supports weighted traffic splitting, mirroring,
+advanced routing, multi-cluster load balancing and more. See the docs to deploy
+[private or public Gateways][gke-gateway-deploy] and also [multi-cluster
+Gateways][gke-multi-cluster-gateway].
+
+Progress towards supporting this project is tracked with a [GitHub
+Issue](https://github.com/GoogleCloudPlatform/gke-gateway-api/issues/20).
+
+[gke]:https://cloud.google.com/kubernetes-engine
+[gke-gateway]:https://cloud.google.com/kubernetes-engine/docs/concepts/gateway-api
+[gke-gateway-deploy]:https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways
+[gke-multi-cluster-gateway]:https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-multi-cluster-gateways
+
diff --git a/site-src/index.md b/site-src/index.md
@@ -1,3 +1,99 @@
 # Introduction
 
-TODO
+Gateway API Inference Extension is an official Kubernetes project focused on
+extending [Gateway API](https://gateway-api.sigs.k8s.io/) with inference
+specific routing extensions.
+
+The overall resource model focuses on 2 new inference-focused
+[personas](/concepts/roles-and-personas) and corresponding resources that
+they are expected to manage:
+
+<!-- Source: https://docs.google.com/presentation/d/11HEYCgFi-aya7FS91JvAfllHiIlvfgcp7qpi_Azjk4E/edit#slide=id.g292839eca6d_1_0 -->
+<img src="/images/resource-model.png" alt="Gateway API Inference Extension Resource Model" class="center" width="550" />
+
+## API Resources
+
+### InferencePool
+
+InferencePool represents a set of Inference-focused Pods and an extension that
+will be used to route to them. Within the broader Gateway API resource model,
+this resource is considered a "backend". In practice, that means that you'd
+replace a Kubernetes Service with an InferencePool. This resource has some
+similarities to Service (a way to select Pods and specify a port), but has some
+unique capabilities. With InferenceModel, you can configure a routing extension
+as well as inference-specific routing optimizations. For more information on
+this resource, refer to our [InferencePool documentation](/api-types/inferencepool).
+
+### InferenceModel
+
+An InferenceModel represents a model or adapter, and configuration associated
+with that model. This resource enables you to configure the relative criticality
+of a model, and allows you to seamlessly translate the requested model name to
+one or more backend model names. Multiple InferenceModels can be attached to an
+InferencePool. For more information on this resource, refer to our
+[InferenceModel documentation](/api-types/inferencemodel).
+
+## Composable Layers
+
+This project aims to develop an ecosystem of implementations that are fully
+compatible with each other. There are three distinct layers of components that
+are relevant to this project:
+
+### Gateway API Implementations
+
+Gateway API has [more than 25
+implementations](https://gateway-api.sigs.k8s.io/implementations/). As this
+pattern stabilizes, we expect a wide set of these implementations to support
+this project.
+
+### Endpoint Selection Extension
+
+As part of this project, we're building an initial reference extension that is
+focused on routing to LoRA workloads. Over time, we hope to see a wide variety
+of extensions emerge that follow this pattern and provide a wide range of
+choices.
+
+### Model Server Frameworks
+
+This project will work closely with model server frameworks to establish a
+shared standard for interacting with these extensions, particularly focused on
+metrics and observability so extensions will be able to make informed routing
+decisions. The project is currently focused on integrations with
+[vLLM](https://github.com/vllm-project/vllm) and
+[Triton](https://github.com/triton-inference-server/server), and will be open to
+other integrations as they are requested.
+
+## Request Flow
+
+To illustrate how this all comes together, it may be helpful to walk through a
+sample request.
+
+1. The first step involves the Gateway selecting the the correct InferencePool
+(set of endpoints running a model server framework) or Service to route to. This
+logic is based on the existing Gateway and HTTPRoute APIs, and will be familiar
+to any Gateway API users or implementers.
+
+2. If the request should be routed to an InferencePool, the Gateway will forward
+the request information to the endpoint selection extension for that pool.
+
+3. The extension will fetch metrics from whichever portion of the InferencePool
+endpoints can best achieve the configured objectives. Note that this kind of
+metrics probing may happen asynchronously, depending on the extension.
+
+4. The extension will instruct the Gateway which endpoint should be routed to.
+
+5. The Gateway will route the request to the desired endpoint.
+
+<img src="/images/request-flow.png" alt="Gateway API Inference Extension Request Flow" class="center" />
+
+
+## Who is working on Gateway API Inference Extension?
+
+This project is being driven by
+[WG-Serving](https://github.com/kubernetes/community/tree/master/wg-serving)
+[SIG-Network](https://github.com/kubernetes/community/tree/master/sig-network)
+to improve and standardize routing to inference workloads in Kubernetes. Check
+out the [implementations reference](implementations.md) to see the latest
+projects & products that support this project. If you are interested in
+contributing to or building an implementation using Gateway API then don’t
+hesitate to [get involved!](/contributing)