-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce privileged-mode #9017
Conversation
8e132bc
to
3ff81c9
Compare
Thanks for the PR! Will review it soon (hopefully) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some confusing operator UX with this PR. You mention this adds a --privileged-mode
flag, but it actually adds two flags: --containerd-privileged-mode
and --baggageclaim-privileged-mode
.
I can see that the value passed into --containerd-privileged-mode
gets passed into baggageclaim. Could we consolidate to only exposing the --containerd-privileged-mode
flag instead? It also means less flags to add to the chart and bosh deployment. This feature only makes sense for the containerd runtime, correct?
It is only usable with containerd at the moment anyway; it would take further investigation (and possibly different options) for different backends, so limiting it to containerd makes sense.
The argument is used as a way to let the baggageclaim runner know the privileged mode. Perhaps the best solution is to pass that as a Go argument to BaggageClaimCommand.Runner instead of making it part of the structure - that would then eliminate the need for making it part of the command structure, and hence a flag. If other backends start supporting a similar mode in the future, potentially the logic could then be extended to make baggageclaim do the right thing for all of them. |
3ff81c9
to
317b539
Compare
I have now updated the PR to not use the extra argument in the baggageclaim command structure, and instead just pass it using Go arguments @taylorsilva. |
c91138c
to
012b4c4
Compare
Sorry for the long delay. I'm reviewing this today as I don't want to let this go super stale. |
Had a heck of a time getting my linux env setup correctly to try this out. I'm on a v6.6 kernel. We should probably specify what "up to date kernel" exactly means. What features does this exactly depend on? Will a v5 kernel allow me to use this feature? or is this a v6+ kernel only feature that we're adding here? I cleaned up the test pipeline a bit:
The
When I changed to
|
Are there any existing implementations of a fuse-only privileged mode by any other container orchestrators? That warning about cgroups-v1 bothered me so I updated my kernel params to disable cgroups v1. I still get the same result with podman failing to clean up the container:
|
The privileged-mode setting lets admins decide what level of privilege tasks running as privileged should have. This gives the ability to lock down privileged access to a level that isn't equivalent to full root on the host. There are three proposed levels: full, the status quo. This has multiple vectors to take over the host, including by loading modules into the kernel. fuse-only, enough to work with containers using tools like buildah and podman if they are configured appropriately. As long as the Concourse worker is run in a user namespace on an up-to-date Linux kernel, this shouldn't be enough access to escape the container. ignore - privileged tasks have the same access as normal tasks. To get podman and buildah working, a few more syscalls need to be allowed through seccomp. A few harmless ones have been added to the general allow list, while others related to mounting and unsharing are only added for fuse-only mode. Signed-off-by: Andrew Miller <[email protected]>
Signed-off-by: Andrew Miller <[email protected]>
argument. Signed-off-by: Andrew Miller <[email protected]>
There shouldn't be a v6+ only feature in use. I'm more more hoping to provide guidance to admins to ensure security with fuse-only mode; if their kernel has a local privilege escalation bug, they shouldn't have a false sense of security that this will protect them. In particular, the security of this relies on:
(alongside setting the environment With Docker, it is a central daemon setting: https://docs.docker.com/engine/security/userns-remap/ If admins don't remap namespaces, they might get a sense of false confidence with fuse-only privileged mode (it will still work, it just won't be secure). I suggest conveying this at least in the release notes (and happy to write up some more docs for this if you feel that is warranted).
I wonder if it is non-deterministic as to whether this shows up in the logs. I remember seeing it once, but it disappeared again with adding another privilege - but it might have just been non-determinism. The error might happen every time, but not make it into the logs always. I'll investigate a bit more to see if I can find a way to reproduce it more consistently, so I can tell if @analytically's proposed new allowed syscalls are sufficient. |
012b4c4
to
3762a9f
Compare
This avoids an error that occurs cleaning up containers. Signed-off-by: Andrew Miller <[email protected]>
3762a9f
to
7a78f90
Compare
I think many other orchestrators let you do this, but they are more tightly coupled to Linux containers, so they do it by providing ways to customise seccomp policies and capabilities. I'm thinking that would be different to Concourse because Concourse has a design principle not to couple tightly to this type of implementation detail. This is one of the things that makes Concourse great to use, but it also means that it's a bit more painful to implement a feature like this compared to other orchestrators we might compare to, because we've got to come up with an upfront opinionated set of capabilities and allowed system calls that is both secure and works for the use case of building and testing containers within Concourse. |
BTW just to update on this, it seems your tidied up script is more likely to reproduce the issue than my original. But with the additional allowed syscalls (in the latest version of this PR), I haven't been able to reproduce it again after multiple runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR Andrew! Thanks for your patience as well as this took longer than it should have for me to review and merge.
@A1kmm Writing up some docs would be very helpful for operators. I was thinking maybe adding a subsection to the containerd section here: https://concourse-ci.org/concourse-worker.html#containerd-runtime You can find the page backing that section here: https://github.com/concourse/docs/blob/1c16e34c6d3a013289d11cbc15fdd8f60ef3218f/lit/docs/install/worker.lit#L496 |
Thanks for reviewing Taylor - really appreciate that you are maintaining Concourse, and completely understand the time given it is a complex new feature. For cross-reference: I just created a documentation PR corresponding to this PR over at: concourse/docs#549 |
@taylorsilva just curious; did you maybe tried to rebuild the worker's binaries after you merged this PR? I just sync'ed our fork with the upstream master and I get this error when trying to built workers binaries.
Still not sure if this is related to our fork or its introduced by the PR... so sorry for the noise if this is coming from our side... UPDATE: it failed to build darwin (arm64) and windows workers, linux is apparently fine. |
I did not try those! Thanks for noting that, I'll look into it and get a fix in. |
@@ -265,5 +265,5 @@ func (cmd *WorkerCommand) baggageclaimRunner(logger lager.Logger) (ifrit.Runner, | |||
|
|||
cmd.Baggageclaim.OverlaysDir = filepath.Join(cmd.WorkDir.Path(), "overlays") | |||
|
|||
return cmd.Baggageclaim.Runner(nil) | |||
return cmd.Baggageclaim.Runner(nil, cmd.Containerd.PrivilegedMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker/workercmd/worker.go:268:53: cmd.Containerd.PrivilegedMode undefined (type ContainerdRuntime has no field or method PrivilegedMode)
cmd.Containerd.PrivilegedMode
is undefined when building on different OS (darwin/windows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't go through all the PR and I got interested to this mostly due to our darwin and windows workers failed to build, but now I am curious: What happens here on linux but when someone uses GardenBackend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing different happens with the garden backend. This is a containerd specific setting. Everything defaults to the current status quo.
The privileged-mode setting lets admins decide what level of privilege tasks running as privileged should have. This gives the ability to lock down privileged access to a level that isn't equivalent to full root on the host.
There are three proposed levels:
full, the status quo. This has multiple vectors to take over the host, including by loading modules into the kernel.
fuse-only, enough to work with containers using tools like buildah and podman if they are configured appropriately. As long as the Concourse worker is run in a user namespace on an up-to-date Linux kernel, this shouldn't be enough access to escape the container. ignore - privileged tasks have the same access as normal tasks.
To get podman and buildah working, a few more syscalls need to be allowed through seccomp. A few harmless ones have been added to the general allow list, while others related to mounting and unsharing are only added for fuse-only mode.
Changes proposed by this PR
Notes to reviewer
This pipeline is helpful for manual testing:
Release Note
--privileged-mode
option to the worker, which acceptsfull
(default, original behaviour),fuse-only
(privileged: true tasks can use tools like buildah and podman, but can't escape if user namespaces are used to run the worker),ignore
(privileged: true tasks have no extra access compared to privileged: false tasks)