generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enhancement for redefining communication between Module-NMC and NMC
controllers
- Loading branch information
1 parent
6d66a6c
commit f78bec9
Showing
1 changed file
with
77 additions
and
0 deletions.
There are no files selected for viewing
77 changes: 77 additions & 0 deletions
77
docs/enhancements/0003-module-nmc-responsibilty-refactoring.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Worker pods for KMM | ||
|
||
Authors: @yevgeny-shnaidman, @ybettan | ||
|
||
## Introduction | ||
|
||
This enhancement aims at redifining areas of resposiblity between Module-NMC and NMC controllers. | ||
This will allow for more clear-cut code and eliminate the variuos race-conditions that we are seeing(or will see) in the current situation | ||
|
||
### Current situation | ||
|
||
Currently both Module-NMC and NMC controller takes decision regarding kernel module deployment based on node status | ||
- Module-NMC controller check the schedulability of the node in order to decide whether kernel module should be deployed or removed | ||
from the node (add/updating spec of the NMC or removing spec of the NMC) | ||
- NMC controller check the node's schedulability to decide whether to start creating loading/unloading pod on the node. In addition it also | ||
check if the node has been recently rebooted, in order to create a loading pod, even if the status and spec of the NMC are equal. | ||
|
||
This creates a situation where 2 entities decide whether kernel modules should be loaded or not based on a nodes' status | ||
|
||
## Goals | ||
|
||
1. Create a clear-cut distinction between responsibilities of the two controllers | ||
2. Eliminate race conditions which are the result of the current situation | ||
|
||
## Non-Goals | ||
|
||
Do not change any other functionality of the two operators, besides their decision making that is described above | ||
|
||
## Design | ||
|
||
### Module-NMC controller decision-making flow | ||
|
||
The flow takes into account both Module with Version field defined (ordered upgrade) and without Version field defined (un-ordered upgrade) | ||
Module-NMC does not take into account the current state of the Node (Ready/NotReady/Schedulable/etc'). It just defines if the kernel module should | ||
be loaded on the node or not based on whether there is a KernelMapping for the current node's kernel and on the labels of the node. All the rest of the decisions | ||
will be taken by NMC reconciler, which has a much better view of Node's current state and kernel module's current state | ||
|
||
1. Found out all the nodes targeted by the Module regardless of node's status, based on the node selector field of the Module | ||
2. If no suitable KernelMapping for the Node's kernel - do nothing | ||
3. If there is a suitable KernelMapping and Version field missing in Module (not an ordered upgrade) - update the spec | ||
4. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is on the node and | ||
its value is equal to the Version - update the spec | ||
5. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is on the node and | ||
its value is not equal to Module's version (meaning old version) - do nothing | ||
6. If there is a suitable KernelMapping, Version field is present in the Module, is missing on the node (meaning kernel module should | ||
not be running on the node) - delete the spec | ||
|
||
In this implementation, Module-NMC does not need to delete the spec, but in the 2 following cases: | ||
1. during ordered upgrade (see point 6 above) | ||
2. Module is deleted, and so the kernel module should be unloaded | ||
|
||
|
||
### NMC controller decision-making flow | ||
|
||
NMC takes into account the NMC spec, status, node's status and node's ready timestamp to make decision whether to run worker pods, and whether to run unload or load | ||
worker pod | ||
|
||
1. If Node is not Ready/Schedulable - do nothing | ||
2. If NMC's status is missing and Node's kernel version equal to NMC's spec kernel version - run worker load pod | ||
3. If NMC's spec is missing, NMC's status is present and NMC's status kernel version equal to Node's kernel version - run worker unload pod | ||
4. If NMC's spec is present and NMC's status is present, and NMC spec differ from NMC status: | ||
- if status kernel version equal to node's kernel version - run worker unload pod | ||
- if spec's kernel version equal to node's kernel version - run worker load pod | ||
5. If NMC's spec is present and NMC's status is present, and NMC spec equal to NMC status and status timestamp older then node's Ready timestamp - run worker load pod | ||
|
||
```mermaid | ||
flowchart TD | ||
NMC[NodeModuleConfig]-->|Reconcile| NMCC[NCM controller] | ||
NMCC-->| get NMC's node| J1((.)) | ||
J1-->|node is not Ready/Schedulable| Done[Done] | ||
J1-->|status missing| J2((.)) | ||
J2-->|node's kernel equals spec' kernel| WLP[Create Worker Load Pod] | ||
J2-->|node's kernel differs spec' kernel| Done | ||
J1-->|spec missing| J3((.)) | ||
J3-->|node's kernel equals status' kernel| WUP[Create Worker UnLload Pod] | ||
J3-->|node's kernel differs status' kernel| Done | ||
``` |