-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nfd-topology-updater missed some cpu for Guaranteed pods #1978
Comments
@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this. @PiotrProkop what are your thoughts on this? EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html |
This PR is still use exclusively allocated CPUs, it doesn't change. But exclusively allocated CPUs is not mean pod's all container has exclusively CPUs. |
Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail. I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus). WDYT @PiotrProkop @ffromani, something we're missing here? |
I think there's a good point here. Need to review the logic and count all the exclusively allocated CPUs |
In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. |
I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported |
Of course, I learn from the docs https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed In my opinion: |
@AllenXu93 you made good points, there could be an actual bug in the area you identified. We can use this issue to add more unit tests. I'll try to have a look ASAP but europe holiday season is incoming. Others are welcome to chime in, I'll surely review. |
What happened:
nfd-topology-updater
is used for report node's NUMA andGuaranteed
pods's cpu used. But it missed cpu forGuaranteed
pod which pod's have any no exclusiveCPU.yj-kubevirtwork-001
kubelet config single-numa-node and cpu-managerI have two pods in node:
In cpu-manager allocated cpu in node
But in noderesourcetopologies CR, it only report nginx's CPU set, not report kubevirt-launcher's CPU
In
nfd-topology-updater
logs:nfd-topology-updater
not got kubevirt-launcher pod's compute container's CPU.What you expected to happen:
nfd-topology-updater
can report all pod container's exclusiveCPU.How to reproduce it (as minimally and precisely as possible):
For example, create a pod
Pod-a
have 2 containers:pod is still a Guaranteed pod , and allocate exclusiveCPU by kubelet's cpu-manager, but
nfd-topology-updater
will not report it's CPU in topology CR.Anything else we need to know?:
I thought this bug is cause by
hasExclusiveCPUs
func:node-feature-discovery/pkg/resourcemonitor/podresourcesscanner.go
Lines 78 to 107 in 1416072
if pod has any container not request cpu as integer, it will return false.
node-feature-discovery/pkg/resourcemonitor/podresourcesscanner.go
Lines 168 to 183 in 1416072
It cause
nfd-topology-updater
skip this pod's all containers loop for report CPU.Environment:
kubectl version
):cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: