Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test custom AMI for EFA support #8623

Open
wants to merge 4 commits into
base: eks
Choose a base branch
from
Open

Conversation

mikkeloscar
Copy link
Contributor

@mikkeloscar mikkeloscar commented Dec 10, 2024

This PR is purely for enabling testing of an in-progress AMI to enable support for EFA on g5 and p5 instances.

The idea is that a cluster switched to this PR branch will be able to run a node-pool like this:

- config_items:
    labels: dedicated=gpu-g5-efa,zalando.org/nvidia-gpu=nvidia-a10g,efa=enabled
    taints: dedicated=gpu-g5-efa:NoSchedule
    kuberuntu_distro_worker: "jammy_with_gpu"
    tag_instance_infrastructure_component: "false"
    internal_node_subnets_enabled: "true"
  discount_strategy: none
  instance_types:
  - g5.8xlarge
  - g5.12xlarge
  - g5.16xlarge
  - g5.24xlarge
  - g5.48xlarge
  max_size: 30
  min_size: 0
  name: gpu-g5-efa
  profile: worker-karpenter

With will run the special AMI defined as: kuberuntu_image_v1_31_jammy_with_gpu_amd64. To change the AMI to test, only that config-item needs to be changed in this PR and the node pool will be updated.

@mikkeloscar mikkeloscar added do-not-merge major Major feature changes or updates, e.g. feature rollout to a new country, new API calls. labels Dec 10, 2024
@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch 5 times, most recently from c25d2fa to 01245c9 Compare December 12, 2024 08:41
@desinurch desinurch force-pushed the datalab-test-efa-testing branch from f51fc7b to 9d01fb7 Compare December 12, 2024 11:36
@mikkeloscar
Copy link
Contributor Author

@desinurch FYI: I merged latest beta branch into this one to keep it up to date.

@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch from 88bf69f to 01a7d8b Compare January 7, 2025 10:04
@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch 9 times, most recently from 01735b8 to 597046b Compare January 10, 2025 13:44
@mikkeloscar mikkeloscar changed the base branch from beta to eks January 10, 2025 13:44
@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch 9 times, most recently from 0c81ee6 to fdbda79 Compare January 13, 2025 14:12
@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch 7 times, most recently from ccfd76a to 7b29f32 Compare January 20, 2025 13:55
@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch from 7b29f32 to b2ceffc Compare January 21, 2025 15:37
@mikkeloscar
Copy link
Contributor Author

@desinurch This PR is not in a mergable state, just enabled for testing.

We need to update the main AMI which we can't do without rolling the nodes, so it must be either done e.g. with Kubernetes v1.32 (or other changes) or we split out the GPU AMI as we discussed and then we can make a PR just for GPU AMI update.

@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch from b2ceffc to 4528187 Compare January 22, 2025 13:22
mikkeloscar and others added 4 commits January 22, 2025 15:23
Signed-off-by: Mikkel Oscar Lyderik Larsen <[email protected]>
Signed-off-by: Mikkel Oscar Lyderik Larsen <[email protected]>
Signed-off-by: Mikkel Oscar Lyderik Larsen <[email protected]>
@mikkeloscar mikkeloscar force-pushed the datalab-test-efa-testing branch from 4528187 to 694c41f Compare January 22, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge major Major feature changes or updates, e.g. feature rollout to a new country, new API calls.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants