Skip to content

Commit

Permalink
Enhancement for redefining communication between Module-NMC and NMC
Browse files Browse the repository at this point in the history
controllers
  • Loading branch information
yevgeny-shnaidman committed Oct 9, 2024
1 parent 6d66a6c commit dafdc93
Showing 1 changed file with 110 additions and 0 deletions.
110 changes: 110 additions & 0 deletions docs/enhancements/0003-module-nmc-responsibilty-refactoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Worker pods for KMM

Authors: @yevgeny-shnaidman, @ybettan

## Introduction

This enhancement aims at redifining areas of resposiblity between Module-NMC and NMC controllers.
This will allow for more clear-cut code and eliminate the variuos race-conditions that we are seeing(or will see) in the current situation

### Current situation

Currently both Module-NMC and NMC controller takes decision regarding kernel module deployment based on node status
- Module-NMC controller check the schedulability of the node in order to decide whether kernel module should be deployed or removed
from the node (add/updating spec of the NMC or removing spec of the NMC)
- NMC controller check the node's schedulability to decide whether to start creating loading/unloading pod on the node. In addition it also
check if the node has been recently rebooted, in order to create a loading pod, even if the status and spec of the NMC are equal.

This creates a situation where 2 entities decide whether kernel modules should be loaded or not based on a nodes' status

## Goals

1. Create a clear-cut distinction between responsibilities of the two controllers
2. Eliminate race conditions which are the result of the current situation

## Non-Goals

Do not change any other functionality of the two operators, besides their decision making that is described above

## Design

### Module-NMC controller decision-making flow

The flow takes into account both Module with Version field defined (ordered upgrade) and without Version field defined (un-ordered upgrade)
Module-NMC does not take into account the current state of the Node (Ready/NotReady/Schedulable/etc'). It just defines if the kernel module should
be loaded on the node or not based on whether there is a KernelMapping for the current node's kernel and on the labels of the node. All the rest of the decisions
will be taken by NMC reconciler, which has a much better view of Node's current state and kernel module's current state

1. Found out all the nodes targeted by the Module regardless of node's status, based on the node selector field of the Module
2. If no suitable KernelMapping for the Node's kernel - do nothing
3. If there is a suitable KernelMapping and Version field missing in Module (not an ordered upgrade) - update the spec
4. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is on the node and
its value is equal to the Version - update the spec
5. If there is a suitable KernelMapping, Version field is present in the Module, module loader version label is on the node and
its value is not equal to Module's version (meaning old version) - do nothing
6. If there is a suitable KernelMapping, Version field is present in the Module, is missing on the node (meaning kernel module should
not be running on the node) - delete the spec

In this implementation, Module-NMC does not need to delete the spec, but in the 2 following cases:
1. during ordered upgrade (see point 6 above)
2. Module is deleted, and so the kernel module should be unloaded

```mermaid
flowchart TD
Module[KMM Module]-->|Reconcile| MNC[Module-NMC controller]
MNC-->|get nodes based on node selector| J1((.))
J1-->|no KernelMapping for node's kernel| Done[Done]
J1-->|found KernelMapping for node's kernel| J2((.))
J2-->|Version missing in Module| US[Update NMC Spec]
J2-->|Version present in Module| J3((.))
J3-->|module loader version label equals Version| US
J3-->|module loader version label not equal Version| Done
J3-->|module loader version label missing| DS[Delete NMC Spec]
```


### NMC controller decision-making flow

NMC takes into account the NMC spec, status, node's status and node's ready timestamp to make decision whether to run worker pods, and whether to run unload or load
worker pod

1. If Node is not Ready/Schedulable - do nothing
2. If NMC's status is missing and Node's kernel version equal to NMC's spec kernel version - run worker load pod
3. If NMC's spec is missing, NMC's status is present and NMC's status kernel version equal to Node's kernel version - run worker unload pod
4. If NMC's spec is present and NMC's status is present, and NMC spec differ from NMC status:
- if status kernel version equal to node's kernel version - run worker unload pod
- if spec's kernel version equal to node's kernel version - run worker load pod
5. If NMC's spec is present and NMC's status is present, and NMC spec equal to NMC status and status timestamp older then node's Ready timestamp - run worker load pod

```mermaid
flowchart TD
NMC[NodeModuleConfig]-->|Reconcile| NMCC[NCM controller]
NMCC-->| get NMC's node| J1((.))
J1-->|node is not Ready/Schedulable| Done[Done]
J1-->|node is Ready/Schedulable| J2((.))
J2-->|status missing| J3((.))
J3-->|node's kernel equals spec' kernel| WLP[Create Worker Load Pod]
J3-->|node's kernel differs spec' kernel| Done
J2-->|spec missing| J4((.))
J4-->|node's kernel equals status' kernel| WUP[Create Worker UnLload Pod]
J4-->|node's kernel differs status' kernel| Done
J2-->|spec and status differ| J5((.))
J5-->|status kernel equals node's kernel| WUP
J5-->|status kernel differs node's kernel| J6((.))
J6-->|spec kernel equals node's kernel| WLP
J6-->|spec kernel differs node's kernel| Done
J2-->|spec and status equal| J7((.))
J7-->|node's Ready timestamp older than status's timestamp| Done
J7-->|node's Ready timestamp new than status's timestamp| J8((.))
J8-->|spec's kernel equals node's kernel| WLP
J8-->|spec's kernel differs node's kernel| Done
```

## Addressing goal

* ** clear-cut distinction between responsibilities of the two controllers **
Module-NMC now specifies want it wants to run on the node, and NMC takes care of when to run it and how

* ** Eliminating race conditions **
Race conditin was due to both controllers looking at the same data Nodes's status and kernel, and making decisions based on that data.
Now Module-NMC does not look at node's status, and NMC looks only at node's status and current kernel

0 comments on commit dafdc93

Please sign in to comment.