Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k0s should move /var/lib/k0s/kubelet to /var/lib/kubelet #1842

Closed
4 tasks done
edgan opened this issue Jun 13, 2022 · 26 comments · Fixed by #5186
Closed
4 tasks done

k0s should move /var/lib/k0s/kubelet to /var/lib/kubelet #1842

edgan opened this issue Jun 13, 2022 · 26 comments · Fixed by #5186
Assignees
Labels
bug Something isn't working config-2.0 enhancement New feature or request
Milestone

Comments

@edgan
Copy link

edgan commented Jun 13, 2022

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Version

1.23.6

Platform

No LSB modules are available.

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04 LTS
Release:	22.04
Codename:	jammy

What happened?

aws-ebs-csi-driver out of the box failed to work with k0s.

Steps to reproduce

  1. Install k0s
  2. Install aws-ebs-csi-driver
  3. Install storageclass for aws-ebs-csi-driver
  4. Create pvc that will use the storageclass

Expected behavior

PVC creation via aws-ebs-csi-driver works

Actual behavior

PVC creation via aws-ebs-csi-driver fails

Screenshots and logs

Jun 13 05:53:05 nexus-nexus-01 k0s[4738]: time="2022-06-13 05:53:05" level=info msg="E0613 05:53:05.391274 4928 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/ebs.csi.aws.com^vol-0e2e37e9eecca4a86 podName: nodeName:}" failed. No retries permitted until 2022-06-13 05:54:09.391247943 +0000 UTC m=+1996.540517455 (durationBeforeRetry 1m4s). Error: MountVolume.SetUp failed for volume "pvc-21851630-5134-4690-9807-91e57ef51902" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-0e2e37e9eecca4a86") pod "nexus-repo-nexus-repository-manager-5b5ff457c5-n5k9l" (UID: "753df29d-d7de-4d68-bd24-1538e1b1eafc") : applyFSGroup failed for vol vol-0e2e37e9eecca4a86: lstat /var/lib/k0s/kubelet/pods/753df29d-d7de-4d68-bd24-1538e1b1eafc/volumes/kubernetes.io~csi/pvc-21851630-5134-4690-9807-91e57ef51902/mount: no such file or directory" component=kubelet

Additional context

The issue is that aws-ebs-csi-driver expects things in /var/lib/kubelet not /var/lib/k0s/kubelet.

Things tried:

  1. Symlinked /var/lib/kubelet to /var/lib/k0s/kubelet
  2. Symlinked /var/lib/k0s/kubelet to /var/lib/kubelet
  3. bind mount of /var/lib/k0s/kubelet to /var/lib/kubelet
  4. Modified the aws-ebs-csi-driver helm chart by hand to use /var/lib/k0s/kubelet

Results:

  1. same error
  2. same error
  3. same error
  4. works

In the process of tracking down this issue I ran across multiple previous k0s issues and multiple CSI driver issues across GitHub projects that all point to many Kubernetes projects assume /var/lib/kubelet. The ideal would be for it to always be configurable, but the reality is that people do assume. The ultimate issue is that k0s is breaking compatibility with a lot of other projects by changing the directory from /var/lib/kubelet to /var/lib/k0s/kubelet.

@edgan edgan added the bug Something isn't working label Jun 13, 2022
@twz123
Copy link
Member

twz123 commented Jun 14, 2022

@makhov
Copy link
Contributor

makhov commented Jun 15, 2022

It's a bug of aws-ebs-csi-driver helm chart. I've submitted a fix: kubernetes-sigs/aws-ebs-csi-driver#1276

As a workaround you can try to add the following to the values:

sidecars:
  nodeDriverRegistrar:
    env: 
      - name: DRIVER_REG_SOCK_PATH
        value: /var/lib/k0s/kubelet/plugins/ebs.csi.aws.com/csi.sock

@edgan
Copy link
Author

edgan commented Jun 15, 2022

As a workaround you can try to add the following to the values:

This isn't working for me. ebs-csi-node goes into a crashloop.

@makhov
Copy link
Contributor

makhov commented Jun 16, 2022

Sorry, I should have pointed out explicitly that you also need to specify the correct node.kubeletPath. These values work for me:

sidecars:
  nodeDriverRegistrar:
    env:
      - name: DRIVER_REG_SOCK_PATH
        value: /var/lib/k0s/kubelet/plugins/ebs.csi.aws.com/csi.sock

node:
  kubeletPath: /var/lib/k0s/kubelet

@jnummelin jnummelin self-assigned this Jun 20, 2022
@jnummelin
Copy link
Member

jnummelin commented Jun 21, 2022

We've definitely seen some issues in our non-default location for kubelet data dir.

The main motivation for keeping everything under /var/lib/k0s is driven by few different things:

  • Easier to cleanup/reset an installation
  • when running k0s in containers/pods, we need to only worry about a single volume to be mounted

For changes like these, we need to also think hard on backwards compatibility. And especially in this case it's super difficult to actually change the kubelet directory. There's really no easy way to do an update for k0s, and thus for kubelet, in a way where we would not have to do a full reset first.

We do get that it's bit inconvenient in some cases, but as kubelet itself has a config option for the data-dir it uses, it's really pretty much always an upstream issue if those are not configurable. Yes, people make assumptions, but in this case those are kinda false assumptions. :) What we (the k0s team) can do here is to always help out in figuring out why something is not working with k0s and help making it work. Just like @makhov did now for the EBS CSI helm charts.

@github-actions
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jul 23, 2022
@makhov
Copy link
Contributor

makhov commented Jul 29, 2022

Closing the issue, since the fix for the EBS CSI helm chart was merged and we can't do more here. Feel free to reopen it as needed.

@doctorpangloss
Copy link
Contributor

this is still a huge pain point

@aronwolf90
Copy link

I agree. Had to debug an enterally day to discover what was the problem. For 1 of 2 Plugins, I solved it by adding a symbol link ln -s /var/lib/k0s/kubelet/ /var/lib/kubelet (in the other case, I had no other choice than editing the k8s files).

@doctorpangloss
Copy link
Contributor

I think they mean for us to "just" fork k0s and fix the path, which will have nothing but positive impacts on using it

@twz123
Copy link
Member

twz123 commented Oct 9, 2023

@aronwolf90 @doctorpangloss If the symlink doesn't work, what about a bind mount? Could you maybe try mount --bind /var/lib/k0s/kubelet /var/lib/kubelet and see if that fixes your issues?

Moreover, which plugins are you using that are lacking support for custom kubelet paths?

@twz123 twz123 reopened this Oct 9, 2023
@twz123 twz123 removed the Stale label Oct 9, 2023
@twz123
Copy link
Member

twz123 commented Oct 9, 2023

/cc #3508 which is similar, but the other way round.

@aronwolf90
Copy link

aronwolf90 commented Oct 9, 2023

@twz123 thanks for your time. In my case, it is an older version of https://github.com/hetznercloud/csi-driver (1.6) and yes, it can be fixed by downloading the yaml of csi-driver and adjust it (what is exactly what I did). The problem that I see here is, that I think that many others would have given up before finding the solution.

For my cluster, it is now fine as it is, but it is definitely a minus point when I have to consider what k8s distro I should recommend to others. This makes me a little sad because I really like the rest of k0s.

NOTE: mount --bind /var/lib/k0s/kubelet /var/lib/kubelet does not work. In the logs I get failed to stage volume: mkdir /var/l ib/k0s/kubelet/plugins/kubernetes.io/csi/pv/pvc-54e61ed0-2bce-44cb-a980-1481b49a2b28/globalmount: no such file or director.

Copy link
Contributor

github-actions bot commented Nov 8, 2023

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Nov 8, 2023
@twz123 twz123 removed the Stale label Nov 9, 2023
Copy link
Contributor

github-actions bot commented Dec 9, 2023

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Dec 9, 2023
@jnummelin jnummelin added enhancement New feature or request and removed Stale labels Dec 11, 2023
@jhughes2112
Copy link

I found that the "zero friction" moniker of k0s to be a bit misleading. Moving the kubelet folder is quite problematic. Here is how I solved it:

sudo mkdir -p /var/lib/k0s/kubelet/pods /var/lib/k0s/kubelet/plugins_registry /var/lib/k0s/kubelet/registration-dir /var/lib/k0s/kubelet/pods-mount-dir /var/lib/k0s/kubelet/plugins /var/lib/k0s/kubelet/device-plugins
sudo ln -s /var/lib/k0s/kubelet/pods /var/lib/kubelet/pods
sudo ln -s /var/lib/k0s/kubelet/plugins_registry /var/lib/kubelet/plugins_registry
sudo ln -s /var/lib/k0s/kubelet/registration-dir /var/lib/kubelet/registration-dir
sudo ln -s /var/lib/k0s/kubelet/pods-mount-dir /var/lib/kubelet/pods-mount-dir
sudo ln -s /var/lib/k0s/kubelet/plugins /var/lib/kubelet/plugins
sudo ln -s /var/lib/k0s/kubelet/device-plugins /var/lib/kubelet/device-plugins
curl -sSLf https://get.k0s.sh | sudo sh
...etc...

You have to run essentially the same series of commands on the control plane and all the workers before installing, otherwise some folders cannot be moved or overwritten due to locked .sock files.
Maybe just symlinking /var/lib/kubelet -> /var/lib/k0s/kubelet would also work, I did not try that.

@p5ntangle
Copy link

p5ntangle commented Oct 11, 2024

Running into to the same issue installing the AWS EFS CSI.

As simple fix could be to just add the symlink for kubelet at /var/lib/kubelet - which appears to the common accepted behaviour.

@Raboo
Copy link

Raboo commented Oct 11, 2024

Well that is a work-around. The proper thing would be to have the kubelet folder to be an optional setting.
It's not just AWS EFS CSI that expects /var/lib/kubelet, other tools are also hard coded to /var/lib/kubelet.

@jnummelin
Copy link
Member

other tools are also hard coded to /var/lib/kubelet.

But consider this, the path is actually configurable on upstream kubelet itself: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#options (search for --root-dir)
So basically for anyone, not only k0s, who configures that path, those hard-coded things will be broken.

And no, I'm not saying k0s couldn't do something for this but we have to be very careful not to break backwards compatibility promises too for our users. I.e. if you've already configured your CSI provider with the correct k0s path and we'd change it, your CSI would get broken (again). So it's not just as simple as "let's change it to kubelet default from now on".

@Raboo
Copy link

Raboo commented Oct 11, 2024

I'm not saying change the default. Just that it can be a configurable parameter that the user can control.
Much like you said "those hard-coded things will be broken." - This goes for k0s as well.

@jhughes2112
Copy link

Yes, absolutely make it an install option. But also realize defaults should be chosen to cause the least effort on the user. I would hope that means using /var/lib/kubelet unless instructed otherwise. The new user experience is difficult because of the chosen path. If it were exposed as an option, I would venture the majority of existing users would eventually reinstall to get back to the standard path. Unfortunately lots of downstream packages assume paths and are not configurable, or are a lot of work to configure.

Either way, making it an install time option to generate symlinks would be very helpful.

@jnummelin jnummelin added this to the 1.32 milestone Oct 14, 2024
@Skaronator
Copy link
Contributor

I'm unable to get kubevirt running with k0s due to this without additional magic on the host system. To be fair, kubevirt has a similar issue open as you can see here: kubevirt/kubevirt#5913


Additional notes if you're curious:

Its possible to re-configure the kubelet path in kubevirt: https://github.com/kubevirt/kubevirt/blob/c205d1dd1bf69a9ae7bc33b6e76b4896c0c76546/cmd/virt-handler/virt-handler.go#L618-L619
but that alone is still not enough. I suspect there is some hard-coded stuff else where. Hence the still open issue.

People used a symlink from /var/lib/k0s/kubelet to /var/lib/kubelet but that no longer works with newer kubevirt versions due to security checks. The current workaround is to do a bind mount from /var/lib/k0s/kubelet to /var/lib/kubelet as described here: kubevirt/kubevirt#5913 (comment) but I want to avoid that.

@BinaryDevotee
Copy link

Just to add to this, a similar problem can be observed using the nfs-csi-driver when the kubelet directory is not in the expected path /var/lib/kubelet.

What happens in that case is the problem described here [1] where the contents written to an NFS volume backed by the NFS CSI driver gets written into the container file system rather than to the NFS share itself. As seen in the issue discussion, the problem relates to the non-standard kubelet path and the suggested fix is to install the NFS CSI driver specifying the actualy default kubelet directory with --set kubeletDir="<kubelet_dir>.

For this particular situation, even though I have a bind mount from /var/lib/k0s/kubelet to /var/lib/kubelet, I still need to ensure that it's being installed in the correct directory with --set kubeletDir="/var/lib/k0s/kubelet, otherwise, the NFS CSI driver still won't work as expected and will demonstrate the behavior described on [1].

Even though that, thankfully this driver allows to be installed in a different location, it's understood that bind mounting the kubelet directory doesn't actually solve the issue, especially involving the storage interface drivers.

Having said that, I also think that being able to select an alternative kubelet directory as a configuration parameter prior to provisioning k0s would be ideal.

[1] kubernetes-csi/csi-driver-nfs#762

@jnummelin
Copy link
Member

@BinaryDevotee Did you try using --rbind? I'm assuming pretty much all CSI providers would need rbind as they need to be able to manage other bind mounts (the volumes) under the "parent" mount.

https://askubuntu.com/questions/1122975/difference-between-rbind-and-bind-in-mounting

@BinaryDevotee
Copy link

I have redeployed the controller nodes with the kubelet directories /var/lib/k0s/kubelet/ and /var/lib/kubelet/ as --rbind mounts as follows:

mkdir --parents /var/lib/k0s/kubelet/ /var/lib/kubelet/
mount --rbind   /var/lib/k0s/kubelet/ /var/lib/kubelet/
mount --make-shared /var/lib/kubelet/

But the behavior seems to be the same. The share gets created in the NFS server but the contents are empty whereas the data resides within the container file system.

For now it's not a blocker as I can override the default kubelet directory as a install parameter of the NFS CSI driver.

I appreciate nevertheless the suggestion. Thank you.

ncopa added a commit to ncopa/k0s that referenced this issue Nov 7, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/

fixes k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
ncopa added a commit to ncopa/k0s that referenced this issue Nov 7, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/

fixes k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
ncopa added a commit to ncopa/k0s that referenced this issue Nov 7, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/

fixes k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
ncopa added a commit to ncopa/k0s that referenced this issue Nov 7, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/

fixes k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
ncopa added a commit to ncopa/k0s that referenced this issue Nov 8, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/

fixes k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
@chinglinwen
Copy link

chinglinwen commented Nov 27, 2024

The following commands maybe the workaround.

single try

mount --bind /var/lib/k0s/kubelet /var/lib/kubelet

persistent change

cat >> /etc/fstab <<EOF
/var/lib/k0s/kubelet /var/lib/kubelet none bind 0 0
EOF
mount -a

ncopa added a commit to ncopa/k0s that referenced this issue Dec 18, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/
fixes: k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
ncopa added a commit to ncopa/k0s that referenced this issue Dec 18, 2024
Storage drivers and others may hardcode /var/lib/kubelet which
confilicts with the k0s default /var/lib/k0s/kubelet. Allow users to
override the kubelet root directory with --kubelet-root-dir similar to
the way they can override --data-dir.

ref: https://cep.dev/posts/adventure-trying-change-kubelet-rootdir/
fixes: k0sproject#1842

Signed-off-by: Natanael Copa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working config-2.0 enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.