Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create containers if there is a v1 cpuset cgroup with exclusive cores #1625

Open
michalsieron opened this issue Dec 13, 2024 · 3 comments

Comments

@michalsieron
Copy link
Contributor

Steps to reproduce:

  1. Boot with systemd.unified_cgroup_hierarchy=0 (in my case it's in a VM with 2 cores)
  2. Create a random cgroup # mkdir /sys/fs/cgroup/cpuset/iamspecial
  3. Assign one of the cores to that cgroup # echo 1 > /sys/fs/cgroup/cpuset/iamspecial/cpuset.cpus
  4. Make that cpuset exclusive # echo 1 > /sys/fs/cgroup/cpuset/iamspecial/cpuset.cpu_exclusive
  5. # crun spec; mkdir -p rootfs/usr/bin; touch rootfs/usr/bin/sh; crun run test1234

Expected result: open executable: Permission denied
Actual result: write `cpuset.cpus`: Invalid argument


Some short notes and observations:

  • Of course, in step 5. you could replace the rootfs with something more proper like a busybox and expect an actual shell being opened.
  • For what it's worth, runc fails the same way, although it's more explicit telling us it failed when writing 0-1 to /sys/fs/cgroup/cpuset/test1234/cpuset.cpus
  • using --systemd-cgroup only results in write `cpuset.cpus`: Permission denied (runc still gets EINVAL)
  • adding "cpu": { "cpus": "0" } in the linux.resources section fixes the issue in most cases

Below is a table, which summarizes all those combinations:

crun crun --systemd-cgroup runc runc --system-cgroup
default config EINVAL EACCES EINVAL EINVAL
"cpu": { "cpus": "0" } EINVAL OK OK OK

So, the main problem here comes from the fact that crun (and runc) tries to initialize newly created cpuset.cpus with value taken from parent cgroup. According to Linux documentation for cgroups v2(!)

An empty value indicates that the cgroup is using the same setting as the nearest cgroup ancestor with a non-empty “cpuset.cpus” or all the available CPUs if none is found.

I cannot find a similar description for cgroups v1. Is that why that initialization is needed? If so, how does one handle cgroups with exclusive cpus? Does one have to traverse the entire cpuset tree to find available cpus? If so, I feel this issue won't be fixed, given cgroups v1 are obsolete anyway.

The secondary problem seems to be that --cgroup-manager=cgroupfs in crun ignores linux.resources.cpu.cpus. Or rather, it will apply them only after initialization happens, as it uses initialize_cpuset_subsystem(). Compare that with --cgroup-manager=systemd, which uses initialize_cpuset_subsystem_resources() and therefore won't attempt putting all cpus in the cpuset.cpus file.

@giuseppe
Copy link
Member

Is there anything holding you from moving to cgroup v2?

My idea to drop cgroup v1 support next year. All major distros already moved to it, and systemd already dropped support.

I am ok if we add small fixes to temporarily fix it, but not sure how much effort you like to put into this issue knowing it will likely be removed in few months.

@michalsieron
Copy link
Contributor Author

First and foremost, I wanted to write this down in case anyone hits a similar issue.
And I do understand very well that cgroup v1 support is to be removed, so I am kinda ok with closing this as wont-fix.
Do you have a more specific date in mind for that removal? As in, which month or at least quarter?

@giuseppe
Copy link
Member

no, that is not decided yet, but I don't think it will happen too quickly. Second half of the year is more reasonable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants