Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for user defined tolerances to modules #940

Merged
merged 1 commit into from
Dec 2, 2024

Conversation

tsprasannaa
Copy link
Contributor

There is a requirement to run selected pods on a cordoned node. One such example is device driver upgrade. The procedure involves the node to be cordoned. At the same time, there should be a way to run house keeping pods that carry out the driver upgrades. These pods should be schedulable.
This PR adds support to carry the tolerances for the pods that carry out the house keeping operations. These user defined tolerances will match the taint that gets added to the nodes.

ModuleSpec will be used to carry the tolerance to the pods which will be used during pod creation

Copy link

linux-foundation-easycla bot commented Nov 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: tsprasannaa / name: Prasannaa TS (a3966c9)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Nov 23, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @tsprasannaa!

It looks like this is your first PR to kubernetes-sigs/kernel-module-management 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kernel-module-management has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 23, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @tsprasannaa. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 23, 2024
Copy link

netlify bot commented Nov 23, 2024

Deploy Preview for kubernetes-sigs-kmm ready!

Name Link
🔨 Latest commit a3966c9
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kmm/deploys/67470894d288480008c1d65d
😎 Deploy Preview https://deploy-preview-940--kubernetes-sigs-kmm.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 25, 2024
@ybettan
Copy link
Contributor

ybettan commented Nov 25, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 25, 2024
@ybettan
Copy link
Contributor

ybettan commented Nov 25, 2024

/cc @yevgeny-shnaidman

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.72%. Comparing base (fa23a9b) to head (a3966c9).
Report is 151 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #940      +/-   ##
==========================================
- Coverage   79.09%   72.72%   -6.38%     
==========================================
  Files          51       65      +14     
  Lines        5109     5770     +661     
==========================================
+ Hits         4041     4196     +155     
- Misses        882     1395     +513     
+ Partials      186      179       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ybettan
Copy link
Contributor

ybettan commented Nov 25, 2024

@tsprasannaa

Thank you for this great PR. Looks like you have thought this one through. I don't have any special comments about it.
(You do have some CI errors though)
/approve

@yevgeny-shnaidman can LGTM when he is happy with it as well.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tsprasannaa, ybettan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 25, 2024
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2024
@@ -134,6 +134,7 @@ func (m *maker) podSpec(mld *api.ModuleLoaderData, containerImage string, pushIm
RestartPolicy: v1.RestartPolicyNever,
Volumes: volumes(mld.ImageRepoSecret, buildConfig),
NodeSelector: selector,
Tolerations: mld.Tolerations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need toleration for builder pod? In principal we should not care on what node the builder pod is running. @ybettan wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it, but I think we do need it.

For example:

  • If it needs to be built on a specific arch and we set the selector of the build pod to that node (which is currently tainted)
  • In the extreme case in which all nodes are tainted simultaneously (there is no reason for us to block this option).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tainting all nodes simultaneously does not make sense, it will cause a lot of other issues within the cluster. I agree regarding the different arch scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yevgeny-shnaidman @ybettan . As discussed in this thread, as the tolerance addition is controlled by other controllers/users, the ideal scenario would be adding taints to nodes in an orderly fashion to get the upgrades done.

@@ -30,7 +30,14 @@ func NewNode(client client.Client) Node {
}
}

func (n *node) IsNodeSchedulable(node *v1.Node) bool {
func (n *node) IsNodeSchedulable(node *v1.Node, tolerations []v1.Toleration) bool {
for _, toleration := range tolerations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node has 2 taints, but we only added one toleration to the KMM Module, then the node will be deemed schedulable, although it should be so, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we are only satisfied (in the current PR) with one fitting taint<-->toleration while what we need is to make sure that all the taints are tolerated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yevgeny-shnaidman @ybettan Thanks for this comment. Agreed. I will refactor the code to check pod tolerance for all the taints on that node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yevgeny-shnaidman I have updated the matching criteria. Please review. Thanks

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 26, 2024
@@ -155,6 +151,14 @@ func (r *NMCReconciler) Reconcile(ctx context.Context, req reconcile.Request) (r
delete(statusMap, moduleNameKey)
}

// removing label of loaded kmods
if !r.nodeAPI.IsNodeSchedulable(&node, nil) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problematic conditions: it means that we ignore the tolerations and the cordoned node will be always unschedulable, and that means that the code never reaches line 165, which will handle unloading of the kernel module as needed during upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During module upgrade, in the first Reconcile, ProcessModuleSpec unloads the module (kernel version being same). Subsequent Reconcile, loads the new module. So the driver upgrade goes through.

Would it help if I move RemoveNodeReadyLabels call to execute just before GarbageCollectInUseLabels?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on how upgrade is managed. We support 2 flows:

  1. the label on the node is changed to a new version, and in that case the unloading will be done in the ProcessModuleSpec
  2. the label on the node is removed, and will be later changed (this will allows user to execute whatever maintenance on the node that is needed). In that case the spec will be removed from NMC, and this scenario is handled in the ProcessUnconfiguredModuleStatus

Copy link
Contributor Author

@tsprasannaa tsprasannaa Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yevgeny-shnaidman. In that case, let me move the API just before GarbageCollectInUseLabels(). That should help both the scenarios. Let me know if this works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yevgeny-shnaidman Pls review this change as well

@tsprasannaa tsprasannaa force-pushed the main branch 2 times, most recently from 56f8cf5 to 172e600 Compare November 26, 2024 15:24
Comment on lines 36 to 48
TAINTLOOP:
for _, taint := range node.Spec.Taints {
for _, toleration := range tolerations {
if taint.Key == toleration.Key && taint.Value == toleration.Value && taint.Effect == toleration.Effect {
continue TAINTLOOP
}
}
if taint.Effect == v1.TaintEffectNoSchedule {
return false
}
}
return true
Copy link
Contributor

@ybettan ybettan Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TAINTLOOP:
for _, taint := range node.Spec.Taints {
for _, toleration := range tolerations {
if taint.Key == toleration.Key && taint.Value == toleration.Value && taint.Effect == toleration.Effect {
continue TAINTLOOP
}
}
if taint.Effect == v1.TaintEffectNoSchedule {
return false
}
}
return true
var toleratedTaints int
for _, taint := range node.Spec.Taints {
for _, toleration := range tolerations {
if toleration.ToleratesTaint(taint) {
// taint tolerated, move to the next taint
toleratedTaints++
break
}
}
}
return toleratedTaints == len(node.Spec.Taints)

WDYT?

@yevgeny-shnaidman this is changing the original behavior but I believe this is a more "complete" solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ybettan your code looks good, but you are missing some corner cases:
if there is no toleration exists for the taint, but taints' effect is missing or it's effect is PreferNoSchedule, then the function should return true.
@tsprasannaa a couple of minor comments for your code: i would prefer not to use labels (TAINTLOOP) if not really needed. In this case it can be solved by a variable in the loop. Also, you should check if taint effect NoExecute, alongside with noSchedule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ybettan @yevgeny-shnaidman Comments taken care. Please take a look. Thanks

There is a requirement to run selected pods on a cordoned node.
One such example is device driver upgrade. The procedure involves
the node to be cordoned. At the same time, there should be a way
to run house keeping pods that carry out the driver upgrades. These
pods should be schedulable.
This PR adds support to carry the tolerances for the pods that carry
out the house keeping operations. These user defined tolerances will
match the taint that gets added to the nodes.

ModuleSpec will be used to carry the tolerance to the pods which will
be used during pod creation
@@ -169,6 +165,14 @@ func (r *NMCReconciler) Reconcile(ctx context.Context, req reconcile.Request) (r
}
}

// removing label of loaded kmods
if !r.nodeAPI.IsNodeSchedulable(&node, nil) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually need to re-work the removal of the ready labels, since we cannot remove them for all the Modules, only for the ones without tolerations. But it is better if we do it in a different PR. I suggest we push this one , and then we will update the flow of the removing the ready labels. @TomerNewman will you have time to take care of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, after merging it I will open a new PR

@yevgeny-shnaidman
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit 29b6433 into kubernetes-sigs:main Dec 2, 2024
22 checks passed
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 3, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 4, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 4, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 8, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 9, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 9, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
TomerNewman added a commit to TomerNewman/kernel-module-management-uppstream that referenced this pull request Dec 9, 2024
This change addresses kubernetes-sigs#940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
k8s-ci-robot pushed a commit that referenced this pull request Dec 9, 2024
This change addresses #940.
Kmod labels are now deleted only if the kernel
module does not have the appropriate tolerations for the taints on the node.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants