Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade managed node groups AMI ID without restarting nodes #705

Open
gnuletik opened this issue Dec 12, 2024 · 1 comment
Open

Upgrade managed node groups AMI ID without restarting nodes #705

gnuletik opened this issue Dec 12, 2024 · 1 comment

Comments

@gnuletik
Copy link

When using managed node groups with Bottlerocket and Brupop, we are facing a situation of nodes starting with an older version of Bottlerocket.

As stated in the docs (https://bottlerocket.dev/en/brupop/1.3.x/troubleshoot/#bottlerocket-instances-start-with-an-old-version-of-bottlerocket), this is due to EKS's managed node groups having an AMI ID fixed to an older version.

The fix for this could be to upgrade EKS's managed node group version.
However, this lead to a slow operation (30 minutes - 1 hour) on EKS (see aws/containers-roadmap#1619)

As nodes are already upgraded by Brupop, this operation is useless.

In order to avoid the slow upgrade operation on EKS, we would like the managed node groups to create new nodes in managed node group with the same Bottlerocket version as the one updated by Brupop.

However, this doesn't seems possible at the moment, every AMI ID change in managed node groups will rollout on all nodes.

Workaround

In this issue (#45), we've seen people working around this by using a custom node group (not managed) with SSM parameters.

However, the impact on this is that node management is not handled with the Kubernetes API with the flow that cordon, drain etc...
Because of this, Pod Disruption Budgets are not respected, which can lead to outages.

So, I'd like to know : what is the recommended way to use brupop with managed node groups? Is there a way to avoid waiting for managed node groups rollout?

Image I'm using: public.ecr.aws/bottlerocket/bottlerocket-update-operator:v1.4.0

@KCSesh
Copy link

KCSesh commented Dec 20, 2024

Hey @gnuletik,

This is a known rough edge with managed node groups (MNG), and we don’t actually recommend you use both together due to this concern. At a high level, MNG is designed to be node replacement, while, Brupop is designed to be in-place updates. These two paradigm’s do have some conflicting priorities. From MNG’s perspective, you updating to another OS version on the nodes is removing some of the “managed” part. So for now, you might consider just choosing 1 or the other, with details on both here.

This is a cool feature request, that belongs to the MNG team! If MNG could detect versioning of the nodes and not rotate them out on an update because it knew the node was already on that version, that would be awesome.
I took a quick look and couldn't find a feature request open for it, but I would encourage you to look and consider creating one here: https://github.com/aws/containers-roadmap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants