Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-ecs: appends invalid ecs user data #32469

Closed
1 task
cheruvian opened this issue Dec 11, 2024 · 4 comments
Closed
1 task

aws-ecs: appends invalid ecs user data #32469

cheruvian opened this issue Dec 11, 2024 · 4 comments
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container bug This issue is a bug. effort/medium Medium work item – several days of effort p1

Comments

@cheruvian
Copy link
Contributor

Describe the bug

ECS appends the following to the asg userdata

autoScalingGroup.addUserData('sudo service iptables save');

This command is invalid on the most recent versions of the ECS optimized AMI

+ sudo iptables --insert FORWARD 1 --in-interface docker+ --destination 169.254.169.254/32 --jump DROP
+ sudo service iptables save
The service command supports only basic LSB actions (start, stop, restart, try-restart, reload, force-reload, status). For other actions, please try to use systemctl.
+ echo ECS_AWSVPC_BLOCK_IMDS=true

Expanding to look at the whole code block the comments seem to be out of place

            // Deny containers access to instance metadata service
            // Source: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html
            autoScalingGroup.addUserData('sudo iptables --insert FORWARD 1 --in-interface docker+ --destination 169.254.169.254/32 --jump DROP');
            autoScalingGroup.addUserData('sudo service iptables save');
            // The following is only for AwsVpc networking mode, but doesn't hurt for the other modes.
            autoScalingGroup.addUserData('echo ECS_AWSVPC_BLOCK_IMDS=true >> /etc/ecs/ecs.config');

// Source: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html
autoScalingGroup.addUserData('sudo iptables --insert FORWARD 1 --in-interface docker+ --destination 169.254.169.254/32 --jump DROP');
autoScalingGroup.addUserData('sudo service iptables save');

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

Should have valid commands in userdata. Is a service restart not needed?

Current Behavior

Command fails and suggests using systemctl

Reproduction Steps

Deploy an ECS cluster backed by ASG

Possible Solution

At a minimum reorder the comments, not sure what command it is intending to run, perhaps sudo netfilter-persistent save which isn't installed on the latest EBS optimized GPU AMI

            // Source: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html
			// ??????

            // The following is only for AwsVpc networking mode, but doesn't hurt for the other modes.
            autoScalingGroup.addUserData('sudo iptables --insert FORWARD 1 --in-interface docker+ --destination 169.254.169.254/32 --jump DROP');
            autoScalingGroup.addUserData('sudo service iptables restart');

            // Deny containers access to instance metadata service
            autoScalingGroup.addUserData('echo ECS_AWSVPC_BLOCK_IMDS=true >> /etc/ecs/ecs.config');

Additional Information/Context

No response

CDK CLI Version

2.134.0 (build 265d769)

Framework Version

No response

Node.js Version

v20.10.0

OS

OSX

Language

TypeScript

Language Version

No response

Other information

No response

@cheruvian cheruvian added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 11, 2024
@github-actions github-actions bot added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Dec 11, 2024
@ashishdhingra ashishdhingra self-assigned this Dec 11, 2024
@ashishdhingra ashishdhingra added p2 investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 11, 2024
@ashishdhingra
Copy link
Contributor

ashishdhingra commented Dec 11, 2024

The MachineImageType enum supports 2 values as of now - AMAZON_LINUX_2 and BOTTLEROCKET. May be we also should update this enum to add AMAZON_LINUX_2023 and then move current default case here under MachineImageType.AMAZON_LINUX_2 switch case and add default case for AL 2023!

Using AWS CLI command aws ssm get-parameters --names /aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id --region us-east-1 (reference Retrieving Amazon ECS-optimized Linux AMI metadata), we could get the latest AMI for AL 2023 returns the following output:

{
    "Parameters": [
        {
            "Name": "/aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id",
            "Type": "String",
            "Value": "ami-0a357ea20d7b79c2c",
            "Version": 61,
            "LastModifiedDate": "2024-11-18T08:07:53.585000-08:00",
            "ARN": "arn:aws:ssm:us-east-1::parameter/aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id",
            "DataType": "text"
        }
    ],
    "InvalidParameters": []
}

Below CDK code could be used to reproduce the issue (assuming we could SSH to EC2 instance launched by ECS task):

import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as autoscaling from 'aws-cdk-lib/aws-autoscaling';

export class CdktestStackNew extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const vpc = new ec2.Vpc(this, 'EcsVpc');
    /*
    const ecsAmi = ec2.MachineImage.lookup({
      name: '/aws/service/ecs/optimized-ami/amazon-linux-2023/recommended/image_id'
    });
    */

    const asg = new autoscaling.AutoScalingGroup(this, 'EcsAsg', { 
      vpc: vpc,
      instanceType: new ec2.InstanceType('t2.xlarge'),
      machineImage: ecs.EcsOptimizedImage.amazonLinux2023(),
      desiredCapacity: 3
    }); 

    const capacityProvider = new ecs.AsgCapacityProvider(this, 'EcsCapacityProvider', {
      autoScalingGroup: asg
    });

    const cluster = new ecs.Cluster(this, 'MyEcsCluster', {
      vpc: vpc
    });
    cluster.addAsgCapacityProvider(capacityProvider);
  }
}

@cheruvian Please validate the finding above. Also share if you were able to gather logs, perhaps from /var/log/cloud-init.log and /var/log/cloud-init-output.log, and share in this issue.

@ashishdhingra ashishdhingra added p1 effort/medium Medium work item – several days of effort and removed p2 labels Dec 11, 2024
@ashishdhingra ashishdhingra removed their assignment Dec 11, 2024
@ashishdhingra ashishdhingra removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Dec 11, 2024
@phuhung273
Copy link
Contributor

phuhung273 commented Dec 12, 2024

Hi team, upon trying to fix this issue, another one come up:

  1. Changing from sudo service iptables save > sudo iptables-save -> instance can join cluster with desired iptable
  2. Reran changed snapshot, found that aws-ecs-patterns/test/ec2/integ.multiple-application-load-balanced-ecs-service doesn't work: 1 out of 2 target groups always unhealthy
  3. Try running sample image locally, port 90 doesn't work
    docker run -d -p 80:80 -p 90:90 amazon/amazon-ecs-sample:latest > curl http://localhost:90 FAIL
    docker run -d -p 80:80 -p 90:80 amazon/amazon-ecs-sample:latest > curl http://localhost:90 WORK

Which means aws-ecs-patterns/test/ec2/integ.multiple-application-load-balanced-ecs-service doesn't make sense ? Please help correct me on this point.

mergify bot pushed a commit that referenced this issue Dec 16, 2024
### Issue # (if applicable)

Closes #32496
Relate #32469 

### Reason for this change


- Invalid user data on AL2023

### Description of changes


- ECS `machineImageType` support AL2023

### Description of how you validated changes


Unit + Integration test

#### Instance can join cluster
![image](https://github.com/user-attachments/assets/aca9ffb3-b993-49f7-a432-eff0cd380952)
#### iptables command success
![image](https://github.com/user-attachments/assets/78edaee1-edc2-4ef8-9a9d-5e67e624c594)

### Checklist
- [x] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@samson-keung
Copy link
Contributor

Hi all, this is a duplicate of #28518. For easier tracking, I will close this issue. Please continue discussion on the other issue.

Upon further investigation on the issue, we have identified challenges in supporting the canContainersAccessInstanceRole=false option, hence, the option will be deprecated in the future. More details in #32609.

Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/aws-ecs Related to Amazon Elastic Container bug This issue is a bug. effort/medium Medium work item – several days of effort p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants