Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile ClusterCIDRs with Finalizers #37

Merged
merged 3 commits into from
Jan 5, 2025

Conversation

mneverov
Copy link
Member

@mneverov mneverov commented Sep 26, 2024

Currently, when a finalizer is present a ClusterCIDR reconciliation (create/update) is skipped. Since users can create ClusterCIDRs with finalizers already in the ClusterCIDR definition it leads to problem described in #17.
This patch makes the following changes:

  • reconcile ClusterCIDRs even if finalizers are present
  • make createClusterCIDR idempotent by checking if a clusterCIDR is already present in the cidrMap.

Performance concerns:

  • for each ClusterCIDR modification the clusterCIDRList in the cidrmap will be traversed. Iterating through a slice even with a thousand elements (nodes) should not be a problem
  • for each modification the allocator will issue the ClusterCIDR update. Again, it should not be a problem since ClusterCIDRs are immutable and such updates should not happen (often). See also Replace Update with Patch #38.

Moving finalizer logic from createClusterCIDR requires changes in syncClusterCIDR signature since if it does not return an error cidrQueue forgets the event. I'll do it in a separate PR to keep this small.

Fixes #17

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 26, 2024
@mneverov mneverov changed the title [WIP] Reconcile ClusterCIDRs with Finalizers Sep 26, 2024
@mneverov mneverov marked this pull request as ready for review September 26, 2024 18:29
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2024
@mneverov
Copy link
Member Author

mneverov commented Oct 7, 2024

@sarveshr7 @ameukam could you ptal?

@@ -1090,12 +1091,10 @@ func (r *multiCIDRRangeAllocator) reconcileCreate(ctx context.Context, clusterCI
defer r.lock.Unlock()

logger := klog.FromContext(ctx)
if needToAddFinalizer(clusterCIDR, clusterCIDRFinalizer) {
Copy link
Contributor

@aojea aojea Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem here is that we were doing the assumption that "finalizer" was the indicator to create or update.

Independently, before Create or Update, we need to check the new object we are building against the old object (this is missing), and then decide if we want to create or update. Regarding PATCH vs PUT, since these objects are only managed by this controller entirely, we can use PUT to move the object to the state we just built

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is since ClusterCIDR has all fields marked as immutable the only update we do is we set the finalizer.
We already check if the finalizer exists. Previously, we skipped the reconciliation and adding a ClusterCIDR to the cidrMap in this case that caused the bug.
With the current fix the finalizer is always added (if needed). We create a ClusterCIDR only when it does not have the ResourceVersion which seems to me a better indicator if the resource has already been processed and stored in etcd.
I understand that it is not the best approach and would maybe add a tuple to the queue: namespaced name and operation, such that we would know exact operation and would process the CIDR accordingly.
Can provide a PoC with the latter approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologize because I only skimmed through the code and my comment can be inaccurate.

I think this is ok, just wanted to say that reconcilation seems to depend on more things than the finalizer, hence, we need to process the entire object.

I understand that it is not the best approach and would maybe add a tuple to the queue: namespaced name and operation, such that we would know exact operation and would process the CIDR accordingly.
Can provide a PoC with the latter approach.

that will make this an event based instead of level based controller, we don't want to do that.

What I try to mean, is that when you need to Update, you have the old and the new object, so you can compare them and decide if you want to update or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea ptal.
I added the check reflect.DeepEqual(old, new) to the update func. With this change reconciliation only happens when there was an actual change in ClusterCIDR. Which should never happen because all ClusterCIDR fields are immutable. The only valid case I see is the metadata change (for example adding finalizers) that should not affect networking.
Other misc changes - logs improvements and ignoring duplicates.
Changed test to print test cases separately:

--- PASS: TestSyncClusterCIDRCreate (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_IPv4_ClusterCIDR_with_no_NodeSelector (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_IPv4_ClusterCIDR_with_NodeSelector (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_IPv4_ClusterCIDR_with_overlapping_CIDRs (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_IPv6_ClusterCIDR_with_no_NodeSelector (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_IPv6_ClusterCIDR_with_NodeSelector (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_IPv6_ClusterCIDR_with_overlapping_CIDRs (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_Dualstack_ClusterCIDR_with_no_NodeSelector (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_DualStack_ClusterCIDR_with_NodeSelector (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/valid_Dualstack_ClusterCIDR_with_overlapping_CIDRs (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/invalid_ClusterCIDR_with_both_IPv4_and_IPv6_CIDRs_nil (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/invalid_IPv4_ClusterCIDR (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/invalid_IPv6_ClusterCIDR (0.00s)
    --- PASS: TestSyncClusterCIDRCreate/invalid_dualstack_ClusterCIDR (0.00s)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, didn't know about it. Thanks!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2024
@aojea
Copy link
Contributor

aojea commented Dec 10, 2024

@mneverov please rebase and sorry for the late response, totally slipped, feel free to ping me in slack if you need reviews

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 1, 2025
@mneverov mneverov marked this pull request as draft January 1, 2025 15:53
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 1, 2025
@mneverov mneverov changed the title Reconcile ClusterCIDRs with Finalizers [WIP] Reconcile ClusterCIDRs with Finalizers Jan 1, 2025
@mneverov mneverov force-pushed the fix-allocator branch 2 times, most recently from 50e54e9 to d6564d5 Compare January 3, 2025 17:46
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 3, 2025
…append ClusterCIDRs if they are already present in the cidrMap which makes createClusterCIDR idempotent.
@mneverov mneverov changed the title [WIP] Reconcile ClusterCIDRs with Finalizers Reconcile ClusterCIDRs with Finalizers Jan 4, 2025
@mneverov mneverov marked this pull request as ready for review January 4, 2025 13:27
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 4, 2025
@k8s-ci-robot k8s-ci-robot requested a review from aojea January 4, 2025 13:27
@aojea
Copy link
Contributor

aojea commented Jan 5, 2025

/lgtm
/approve

Thanks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 5, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, mneverov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit adc6dde into kubernetes-sigs:main Jan 5, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart
4 participants