Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort mounts with stable sort #240

Open
Epsilon314 opened this issue Nov 11, 2024 · 6 comments
Open

Sort mounts with stable sort #240

Epsilon314 opened this issue Nov 11, 2024 · 6 comments

Comments

@Epsilon314
Copy link

Epsilon314 commented Nov 11, 2024

We got an unexpected change of the order of mounts after introducing cdi (indirectly through updating Nvidia container toolkit), which in specific cases lead to failed dind container creation. The change of mounts order is related to the way sortMounts works, as referred below:
https://github.com/cncf-tags/container-device-interface/blob/main/pkg/cdi/container-edits.go#L358

// sortMounts sorts the mounts in the given OCI Spec.
func sortMounts(specgen *ocigen.Generator) {
	mounts := specgen.Mounts()
	specgen.ClearMounts()
	sort.Sort(orderedMounts(mounts))
	specgen.Config.Mounts = mounts
}

// orderedMounts defines how to sort an OCI Spec Mount slice.
// This is the almost the same implementation sa used by CRI-O and Docker,
// with a minor tweak for stable sorting order (easier to test):
//
//	https://github.com/moby/moby/blob/17.05.x/daemon/volumes.go#L26
type orderedMounts []oci.Mount

// Len returns the number of mounts. Used in sorting.
func (m orderedMounts) Len() int {
	return len(m)
}

// Less returns true if the number of parts (a/b/c would be 3 parts) in the
// mount indexed by parameter 1 is less than that of the mount indexed by
// parameter 2. Used in sorting.
func (m orderedMounts) Less(i, j int) bool {
	ip, jp := m.parts(i), m.parts(j)
	if ip < jp {
		return true
	}
	if jp < ip {
		return false
	}
	return m[i].Destination < m[j].Destination
}
  1. Go1.19 changes sort.Sort from stable to unstable, and we can use sort.Stable instead (w.r.t https://github.com/containerd/containerd/blob/main/internal/cri/opts/spec_linux_opts.go#L68);
  2. We can change orderedMounts.Less method to only compare parts, so that the mount order submitted by container spec yaml will be kept, in case it matters.
@elezar
Copy link
Contributor

elezar commented Nov 11, 2024

@Epsilon314 thanks for calling this out. Do you have example mounts so that we can include those cases in our tests while addressing this?

@Epsilon314
Copy link
Author

Epsilon314 commented Nov 12, 2024

@elezar We encounter this issue when creating a docker in docker container:

  1. Mount to container path /run/containerd and /proc/cpuinfo
  2. Create another DIND container in the first container. If /proc/cpuinfo are mounted before /run/containerd, runc can see that /run/containerd has submountpoint /run/containerd/xxx/rootfs/proc/cpuinfo and runc will rbind mount all submountpoints. In this case we failed to create the DIND container, since that extra submount will cause mounting proc for DIND container to fail, restricted by the mnt_already_visible check in kernel.

Add a unit test case PR to explain the desired result.

@klihub
Copy link
Contributor

klihub commented Nov 13, 2024

@elezar We encounter this issue when creating a docker in docker container:

  1. Mount to container path /run/containerd and /proc/cpuinfo
  2. Create another DIND container in the first container. If /proc/cpuinfo are mounted before /run/containerd, runc can see that /run/containerd has submountpoint /run/containerd/xxx/rootfs/proc/cpuinfo and runc will rbind mount all submountpoints. In this case we failed to create the DIND container, since that extra submount will cause mounting proc for DIND container to fail, restricted by the mnt_already_visible check in kernel.

Add a unit test case PR to explain the desired result.

Okay, so was this a problem that was only reproducible in nested containers ? Because otherwise, in a normal non-nested case it should be guaranteed that equally long container paths (in terms of directory components in the path) cannot have mount ordering dependencies on each other, only longer paths can have on shorter ones, right ?

@Epsilon314
Copy link
Author

@klihub i think so

@elezar
Copy link
Contributor

elezar commented Nov 14, 2024

@Epsilon314 the PR looks good to me. Thanks.

One question is how to distribute this change. For clients such as the NVIDIA Container Toolkit that was mentioned, pulling in a new version of the CDI packages should not be a problem. Do we need to do anything specific for other clients where we may be consuming an older CDI package version? @klihub do you have thoughts on that?

@klihub
Copy link
Contributor

klihub commented Nov 14, 2024

@Epsilon314 the PR looks good to me. Thanks.

One question is how to distribute this change. For clients such as the NVIDIA Container Toolkit that was mentioned, pulling in a new version of the CDI packages should not be a problem. Do we need to do anything specific for other clients where we may be consuming an older CDI package version? @klihub do you have thoughts on that?

I think that if this only bites folks who run a runtime-in-a-runtime, usually DIND, then this probably has a direct effect on a marginally small set of people. So I wouldn't worry about it unless more bug reports about this start flowing in. IOW, I'd make this part of our 1.0 release, then update containerd 2.0, 1.7, and cri-o main and the latest two minor release series for a starter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants