Add support for Knative #1682

tombanksme · 2024-07-15T12:40:06Z

This is a proof-of-concept pull request to add the canary release deployment strategy for Knative (#903). It's far from production ready but I wanted to see if there's appetite for this feature before I invest more time into completing it.

Currently Flagger can target a Knative service & complete the canary release process. I've not tested rollbacks yet although I think it should just work. The pull request needs some extra work to add test coverage; throw errors when using unsupported release processes & add Knative specific Kubernetes events.

Let me know what you think!

Example

The following canary will target a Knative Service. Once the canary has been initialised you can start the canary release process by creating a new revision of the service.

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: example
spec:
  targetRef:
    apiVersion: serving.knative.dev/v1
    kind: Service
    name: example
  service:
    port: 3000
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 5

Signed-off-by: Thomas Banks <[email protected]>

edude03 · 2024-09-03T14:54:42Z

(I haven't used flagger / knative in awhile but) I'm very excited to see this taking shape!

arubino · 2024-10-04T15:49:43Z

Hi @tombanksme! This looks great, are there any news on this PR?

tombanksme · 2024-10-04T15:53:06Z

I haven't heard anything back from the Flagger maintainers. I would be happy to finish up the PR if it's something that fluxcd would consider merging.

aryan9600 · 2024-10-06T09:13:30Z

while i see the value added by this PR, is it not possible to use Gateway API as a bridge to get Flagger and Knative to work together: https://github.com/knative-extensions/net-gateway-api

tombanksme · 2024-11-27T14:12:05Z

Sorry for the delay. It looks like net-gateway-api isn't ready for production yet according to the readme

kahirokunn · 2024-12-12T01:01:19Z

@aryan9600

I'll help craft a detailed technical response explaining why the Gateway API approach wouldn't work for Flagger-Knative integration:

Thank you for bringing this up. As someone familiar with Knative, I'd like to clarify why using Gateway API as a bridge between Flagger and Knative wouldn't be possible, and why the current PR approach is actually the correct solution.

The net-gateway-api project has a different purpose - it's not meant to be an control interface for Knative, but rather it allows Knative to use Gateway API as its final networking output. Let me explain how Knative's networking architecture works:

Knative uses a strictly defined traffic flow architecture where:

Traffic control is managed through KService resources
The networking layer uses KIngress resources to configure the Ingress Gateway
The Ingress Gateway then routes requests either to the activator or directly to Knative Service Pods

To integrate with Knative properly, external tools must interact through the KService API - this is the only supported entry point for managing Knative services and their traffic patterns.
The net-gateway-api project's scope is specifically about allowing Knative to output Gateway API resources instead of other ingress types - it's about the implementation layer, not the control layer.

Therefore, while the suggestion to use Gateway API as a bridge is interesting, it wouldn't provide the deep integration needed for proper traffic management. The current PR's approach of integrating directly with KService is actually the only correct way to achieve this integration - there are no alternative approaches in the current Knative roadmap that would provide the same level of proper integration.

I've reviewed the implementation in the PR, and it aligns perfectly with Knative's architectural principles and control patterns.

Reference: https://knative.dev/docs/serving/architecture/#traffic-flow-and-dns

aryan9600 · 2025-01-13T10:19:35Z

i did a quick first pass and the approach looks good. just to confirm, both the user and flagger own the Knative Service object, with the former owning the workload configuration and the latter owning the traffic configuration?

furthermore, can you add some preliminary docs? or share the steps you took to test this change?

kahirokunn · 2025-01-13T12:20:52Z

Yes, We share one Knative Service object.

dprotaso

Hey Knative Serving Lead here 👋. Awesome to see a change like this 🎉 .

Just some comments but I'm not a flagger expert so I'll defer to others.

dprotaso · 2025-01-14T02:55:17Z

go.mod

@@ -27,6 +27,7 @@ require (
 	k8s.io/client-go v0.30.1
 	k8s.io/code-generator v0.30.1
 	k8s.io/klog/v2 v2.120.1
+	knative.dev/serving v0.41.1


note:
v0.43.0 tag is out now (v1.16)
v0.44.0 tag will be out next week (v1.17)

Though the API definitions are stable so you should be fine not using the latest

Fantastic! 🥳

I'll get these updated

dprotaso · 2025-01-14T02:57:06Z

pkg/canary/factory.go


-	switch kind {
-	case "DaemonSet":
+	switch {


nit: dunno what coding style this project has but i'd just switch on kind and maybe have a nested if for Service

Yup that's my bad. I was very new to go when I wrote this. I'll get it cleaned up

dprotaso · 2025-01-14T03:16:57Z

pkg/canary/knative_controller.go

+		return true, fmt.Errorf("service %s.%s get query error: %w", cd.Spec.TargetRef.Name, cd.Namespace, err)
+	}
+
+	revision, err := kc.knativeClient.ServingV1().Revisions(cd.Namespace).Get(context.TODO(), service.Status.LatestCreatedRevisionName, metav1.GetOptions{})


(not a flagger expert)

I think you'd want to guard against LatestCreatedRevisionName being equal to the value of annotations[flagger.app/primary-revision] - this would ensure you wait for a new revision to be created which becomes your canary.

Otherwise I'm thinking if there was ever a delay in the knative reconcilers it'll look at the wrong revision here

Note: revisions have a configurationGeneration label

dprotaso · 2025-01-14T03:18:07Z

pkg/canary/knative_controller.go

+}
+
+func (kc *KnativeController) GetMetadata(canary *flaggerv1.Canary) (string, string, map[string]int32, error) {
+	// TODO: Do we need this for Knative?


Curious what does flagger use this for?

dprotaso · 2025-01-14T03:19:42Z

pkg/canary/knative_controller.go

+}
+
+func (kc *KnativeController) ScaleToZero(canary *flaggerv1.Canary) error {
+	// Not Implemented: Not needed for Knative deployments


Just curious what this does for other providers?

In other providers flagger maintains two instances of the application. From memory the process looks like this:

When a new version is created update the canary & scale it up

Slowly send traffic from the primary to the canary & monitor

Once the canary reaches 100% & is performing as expected, update the primary

Switch 100% traffic over to the primary

Scale the canary back to zero

dprotaso · 2025-01-14T03:26:31Z

pkg/canary/knative_controller.go

+		return true, fmt.Errorf("service %s.%s get query error: %w", cd.Spec.TargetRef.Name, cd.Namespace, err)
+	}
+
+	return hasSpecChanged(cd, service.Status.LatestCreatedRevisionName)


(not a flagger dev)

What 'applies' the new Service spec.template that would trigger the rollout?

This was the process that I'd envisioned:

User creates a Knative service for their application

User creates a Flagger configuration pointing at the Knative service

Flagger takes over the traffic management of the Knative service
4.a. It adds the metadata it needs to operate
4.b. Wasn't done in this original pull request but it can also probably set the default traffic routing to zero for all revisions & manually set the latest to 100%

When the user wants to update an application they update the Knative service like they would usually

Flagger watches for the new revision & then does the traffic management changeover

So the user is still responsible for triggering the rollout like they would with any other Knative service here

dprotaso · 2025-01-14T03:36:15Z

charts/flagger/templates/rbac.yaml

+  - apiGroups:
+      - serving.knative.dev
+    resources:
+      - services


(just an FYI and this is less common)

You can have a Knative Route and point it to revisions from different knative Configurations. Knative Services are a convenience over it.

I doubt anyone needs that kind of flexibility. Tackling Services rollout will probably cover 99% of use cases.

I'll look at this. If it's easy to add I'll do it in this pull request; if not I'll spin it out into another PR

dprotaso · 2025-01-14T03:46:09Z

pkg/router/knative.go

+		return
+	}
+
+	return int(*service.Status.Traffic[primaryRevisionIdx].Percent), int(*service.Status.Traffic[canaryRevisionIdx].Percent), false, nil


unsure when this method is called but in general Knative Services's percent can be nil

tombanksme · 2025-01-14T07:08:06Z

Morning folks. Thank you all for taking a look & investing your time into this. I'll allocate some time over the next couple of weeks to get this cleaned up as promised.

feat: add knative support (first pass)

4c7eccf

Signed-off-by: Thomas Banks <[email protected]>

tombanksme marked this pull request as ready for review July 15, 2024 12:40

tombanksme requested a review from stefanprodan as a code owner July 15, 2024 12:40

kahirokunn mentioned this pull request Jan 3, 2025

REQUEST: New membership for kahirokunn kubernetes/org#5325

Closed

11 tasks

kahirokunn mentioned this pull request Jan 13, 2025

chore: Improve KubernetesRouter selection based on apiGroup #1750

Open

dprotaso reviewed Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Knative #1682

Add support for Knative #1682

tombanksme commented Jul 15, 2024

edude03 commented Sep 3, 2024

arubino commented Oct 4, 2024

tombanksme commented Oct 4, 2024

aryan9600 commented Oct 6, 2024

tombanksme commented Nov 27, 2024 •

edited

Loading

kahirokunn commented Dec 12, 2024 •

edited

Loading

aryan9600 commented Jan 13, 2025

kahirokunn commented Jan 13, 2025 •

edited

Loading

dprotaso left a comment

dprotaso Jan 14, 2025

tombanksme Jan 14, 2025

dprotaso Jan 14, 2025

tombanksme Jan 14, 2025

dprotaso Jan 14, 2025

dprotaso Jan 14, 2025

dprotaso Jan 14, 2025

dprotaso Jan 14, 2025

tombanksme Jan 14, 2025

dprotaso Jan 14, 2025

tombanksme Jan 14, 2025 •

edited

Loading

dprotaso Jan 14, 2025

tombanksme Jan 14, 2025

dprotaso Jan 14, 2025

tombanksme commented Jan 14, 2025

Add support for Knative #1682

Are you sure you want to change the base?

Add support for Knative #1682

Conversation

tombanksme commented Jul 15, 2024

Example

edude03 commented Sep 3, 2024

arubino commented Oct 4, 2024

tombanksme commented Oct 4, 2024

aryan9600 commented Oct 6, 2024

tombanksme commented Nov 27, 2024 • edited Loading

kahirokunn commented Dec 12, 2024 • edited Loading

aryan9600 commented Jan 13, 2025

kahirokunn commented Jan 13, 2025 • edited Loading

dprotaso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tombanksme Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tombanksme commented Jan 14, 2025

tombanksme commented Nov 27, 2024 •

edited

Loading

kahirokunn commented Dec 12, 2024 •

edited

Loading

kahirokunn commented Jan 13, 2025 •

edited

Loading

tombanksme Jan 14, 2025 •

edited

Loading