-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for user defined tolerances to modules #940
Conversation
|
Welcome @tsprasannaa! |
Hi @tsprasannaa. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-kmm ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #940 +/- ##
==========================================
- Coverage 79.09% 72.72% -6.38%
==========================================
Files 51 65 +14
Lines 5109 5770 +661
==========================================
+ Hits 4041 4196 +155
- Misses 882 1395 +513
+ Partials 186 179 -7 ☔ View full report in Codecov by Sentry. |
Thank you for this great PR. Looks like you have thought this one through. I don't have any special comments about it. @yevgeny-shnaidman can LGTM when he is happy with it as well. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tsprasannaa, ybettan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -134,6 +134,7 @@ func (m *maker) podSpec(mld *api.ModuleLoaderData, containerImage string, pushIm | |||
RestartPolicy: v1.RestartPolicyNever, | |||
Volumes: volumes(mld.ImageRepoSecret, buildConfig), | |||
NodeSelector: selector, | |||
Tolerations: mld.Tolerations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need toleration for builder pod? In principal we should not care on what node the builder pod is running. @ybettan wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it, but I think we do need it.
For example:
- If it needs to be built on a specific arch and we set the selector of the build pod to that node (which is currently tainted)
- In the extreme case in which all nodes are tainted simultaneously (there is no reason for us to block this option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tainting all nodes simultaneously does not make sense, it will cause a lot of other issues within the cluster. I agree regarding the different arch scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yevgeny-shnaidman @ybettan . As discussed in this thread, as the tolerance addition is controlled by other controllers/users, the ideal scenario would be adding taints to nodes in an orderly fashion to get the upgrades done.
internal/node/node.go
Outdated
@@ -30,7 +30,14 @@ func NewNode(client client.Client) Node { | |||
} | |||
} | |||
|
|||
func (n *node) IsNodeSchedulable(node *v1.Node) bool { | |||
func (n *node) IsNodeSchedulable(node *v1.Node, tolerations []v1.Toleration) bool { | |||
for _, toleration := range tolerations { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node has 2 taints, but we only added one toleration to the KMM Module, then the node will be deemed schedulable, although it should be so, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we are only satisfied (in the current PR) with one fitting taint<-->toleration while what we need is to make sure that all the taints are tolerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yevgeny-shnaidman @ybettan Thanks for this comment. Agreed. I will refactor the code to check pod tolerance for all the taints on that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yevgeny-shnaidman I have updated the matching criteria. Please review. Thanks
@@ -155,6 +151,14 @@ func (r *NMCReconciler) Reconcile(ctx context.Context, req reconcile.Request) (r | |||
delete(statusMap, moduleNameKey) | |||
} | |||
|
|||
// removing label of loaded kmods | |||
if !r.nodeAPI.IsNodeSchedulable(&node, nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a problematic conditions: it means that we ignore the tolerations and the cordoned node will be always unschedulable, and that means that the code never reaches line 165, which will handle unloading of the kernel module as needed during upgrade
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During module upgrade, in the first Reconcile, ProcessModuleSpec unloads the module (kernel version being same). Subsequent Reconcile, loads the new module. So the driver upgrade goes through.
Would it help if I move RemoveNodeReadyLabels call to execute just before GarbageCollectInUseLabels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on how upgrade is managed. We support 2 flows:
- the label on the node is changed to a new version, and in that case the unloading will be done in the ProcessModuleSpec
- the label on the node is removed, and will be later changed (this will allows user to execute whatever maintenance on the node that is needed). In that case the spec will be removed from NMC, and this scenario is handled in the ProcessUnconfiguredModuleStatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yevgeny-shnaidman. In that case, let me move the API just before GarbageCollectInUseLabels(). That should help both the scenarios. Let me know if this works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yevgeny-shnaidman Pls review this change as well
56f8cf5
to
172e600
Compare
internal/node/node.go
Outdated
TAINTLOOP: | ||
for _, taint := range node.Spec.Taints { | ||
for _, toleration := range tolerations { | ||
if taint.Key == toleration.Key && taint.Value == toleration.Value && taint.Effect == toleration.Effect { | ||
continue TAINTLOOP | ||
} | ||
} | ||
if taint.Effect == v1.TaintEffectNoSchedule { | ||
return false | ||
} | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TAINTLOOP: | |
for _, taint := range node.Spec.Taints { | |
for _, toleration := range tolerations { | |
if taint.Key == toleration.Key && taint.Value == toleration.Value && taint.Effect == toleration.Effect { | |
continue TAINTLOOP | |
} | |
} | |
if taint.Effect == v1.TaintEffectNoSchedule { | |
return false | |
} | |
} | |
return true | |
var toleratedTaints int | |
for _, taint := range node.Spec.Taints { | |
for _, toleration := range tolerations { | |
if toleration.ToleratesTaint(taint) { | |
// taint tolerated, move to the next taint | |
toleratedTaints++ | |
break | |
} | |
} | |
} | |
return toleratedTaints == len(node.Spec.Taints) |
WDYT?
@yevgeny-shnaidman this is changing the original behavior but I believe this is a more "complete" solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ybettan your code looks good, but you are missing some corner cases:
if there is no toleration exists for the taint, but taints' effect is missing or it's effect is PreferNoSchedule, then the function should return true.
@tsprasannaa a couple of minor comments for your code: i would prefer not to use labels (TAINTLOOP) if not really needed. In this case it can be solved by a variable in the loop. Also, you should check if taint effect NoExecute, alongside with noSchedule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ybettan @yevgeny-shnaidman Comments taken care. Please take a look. Thanks
There is a requirement to run selected pods on a cordoned node. One such example is device driver upgrade. The procedure involves the node to be cordoned. At the same time, there should be a way to run house keeping pods that carry out the driver upgrades. These pods should be schedulable. This PR adds support to carry the tolerances for the pods that carry out the house keeping operations. These user defined tolerances will match the taint that gets added to the nodes. ModuleSpec will be used to carry the tolerance to the pods which will be used during pod creation
@@ -169,6 +165,14 @@ func (r *NMCReconciler) Reconcile(ctx context.Context, req reconcile.Request) (r | |||
} | |||
} | |||
|
|||
// removing label of loaded kmods | |||
if !r.nodeAPI.IsNodeSchedulable(&node, nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually need to re-work the removal of the ready labels, since we cannot remove them for all the Modules, only for the ones without tolerations. But it is better if we do it in a different PR. I suggest we push this one , and then we will update the flow of the removing the ready labels. @TomerNewman will you have time to take care of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, after merging it I will open a new PR
/lgtm |
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses kubernetes-sigs#940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
This change addresses #940. Kmod labels are now deleted only if the kernel module does not have the appropriate tolerations for the taints on the node.
There is a requirement to run selected pods on a cordoned node. One such example is device driver upgrade. The procedure involves the node to be cordoned. At the same time, there should be a way to run house keeping pods that carry out the driver upgrades. These pods should be schedulable.
This PR adds support to carry the tolerances for the pods that carry out the house keeping operations. These user defined tolerances will match the taint that gets added to the nodes.
ModuleSpec will be used to carry the tolerance to the pods which will be used during pod creation