The context for this issue is well described in the Gitter aws-cdk channel over here, along with the tests performed by @skinny85, @reisingerf and myself:
https://gitter.im/awslabs/aws-cdk?at=5e54579d9aeef6523217b25f
Given the following snippet, as described in the Gitter thread:
const vpc = ec2.Vpc.fromLookup(this, 'Vpc', {
isDefault: true,
});
const batch_instance_role = new iam.Role(this, 'BatchInstanceRole', {
roleName: 'UmccriseBatchInstanceRole',
assumedBy: new iam.CompositePrincipal(
new iam.ServicePrincipal('ec2.amazonaws.com'),
new iam.ServicePrincipal('ecs.amazonaws.com'),
),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2RoleforSSM'),
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2ContainerServiceforEC2Role')
],
});
const spotfleet_role = new iam.Role(this, 'AmazonEC2SpotFleetRole', {
assumedBy: new iam.ServicePrincipal('spotfleet.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2SpotFleetTaggingRole'),
],
});
const batch_service_role = new iam.Role(this, 'BatchServiceRole', {
assumedBy: new iam.ServicePrincipal('batch.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSBatchServiceRole'),
],
});
const batch_instance_profile = new iam.CfnInstanceProfile(this, 'BatchInstanceProfile', {
instanceProfileName: 'UmccriseBatchInstanceProfile',
roles: [batch_instance_role.roleName],
});
const launch_template = new ec2.CfnLaunchTemplate(this, 'LaunchTemplate', {
launchTemplateData: {
userData: core.Fn.base64(`
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
echo Hello
--==MYBOUNDARY==--
`),
},
launchTemplateName: 'UmccriseBatchComputeLaunchTemplate',
});
new batch.CfnComputeEnvironment(this, 'UmccriseBatchComputeEnv', {
type: 'MANAGED',
serviceRole: batch_service_role.roleArn,
computeResources: {
type: 'SPOT',
maxvCpus: 128,
minvCpus: 0,
desiredvCpus: 0,
imageId: 'ami-05c621ca32de56e7a',
launchTemplate: {
launchTemplateId: launch_template.ref,
version: launch_template.attrLatestVersionNumber,
},
spotIamFleetRole: spotfleet_role.roleArn,
instanceRole: batch_instance_profile.instanceProfileName!,
instanceTypes: ['optimal'],
subnets: [vpc.publicSubnets[0].subnetId],
securityGroupIds: ['sg-0a5cf974'],
tags: { 'Creator': 'Batch' },
}
});
For more context, there's this other working example too:
https://github.com/awslabs/aws-batch-helpers/issues/5#issue-425133706
This:
Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format
Coupled with the deploy time error (in Python, ask @skinny85 for the TypeScript counterpart):
6/10 | 10:01:20 AM | UPDATE_FAILED | AWS::Batch::ComputeEnvironment | UmccriseBatchComputeEnv Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format
/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7838:49
\_ Kernel._wrapSandboxCode (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:8298:20)
\_ Kernel._create (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7838:26)
\_ Kernel.create (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7585:21)
\_ KernelHost.processRequest (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7372:28)
\_ KernelHost.run (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7312:14)
\_ Immediate._onImmediate (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7315:37)
\_ processImmediate (internal/timers.js:456:21)
This is :bug: Bug Report
Here's the original CDK-Python snippet that fails in the same way and triggered the Gitter discussion:
There seems to be something funny going on with new line handling.
It was brought to my attention, by an AWS support engineer, that there shouldn't be any empty lines in user data scripts. However, whenever I try to create a multi-line string in Python it seems to add empty new lines when writing the CF template.
I am not sure that is the real issue, but it is unexpected and looks wrong.
For example:
user_data_script = """
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
/bin/echo "Hello World" >> /tmp/testfile.txt
--//
"""
with 'userData': core.Fn.base64(user_data_script), would end up being:
LaunchTemplateData:
UserData:
Fn::Base64: >
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
/bin/echo "Hello World" >> /tmp/testfile.txt
--//
LaunchTemplateName: UmccriseBatchComputeLaunchTemplate
A 'userData': core.Fn.base64(core.Fn.join(delimiter='\n', list_of_values=['#!/bin/bash', 'echo Hello'])) has the same issue of adding empty lines.
Also, tests without core.Fn.base64 seem to have that issue:
'userData': '#!/bin/bash\necho FOO' LaunchTemplateData:
UserData: >-
#!/bin/bash
echo FOO
LaunchTemplateName: UmccriseBatchComputeLaunchTemplate
Unexpected new line
'userData': '#!/bin/bash\recho FOO' LaunchTemplateData:
UserData: "#!/bin/bash\recho FOO"
LaunchTemplateName: UmccriseBatchComputeLaunchTemplate
Carriage return not recognised?
@brainstorm I believe the problem is that when you declare a multi line string in python like so:
user_data = `
my-script
`
What you actually get is '\nmy-script\n. This causes the first line in the script to be empty, which violates the User Data Formats.
Try replacing your declaration with:
user_data = `MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
echo Hello
--==MYBOUNDARY==--`
Thanks for the suggestion!
However, I'm afraid I don't understand. If I run your user_data code I get:
Traceback (most recent call last):
File "app.py", line 4, in <module>
from stacks.batch import BatchStack
File "/Users/freisinger/Devel/projects/github/UMCCR/infrastructure/cdk/apps/umccrise/stacks/batch.py", line 22
user_data = `MIME-Version: 1.0
^
SyntaxError: invalid syntax
Subprocess exited with error 1
I did use triple quotes for my multiline user data string, but I still end up with extra empty lines.
We also use an array of strings with core.Fn.join, with the same result of added empty lines.
See GitHub link above for a flavour of our attempts...
After much playing around I've managed to see a workaround for a python deployment.
Reading in the user data from a file is better than using a multi-line string.
with open("user_data/user_data.txt", 'r') as user_data_h:
user_data = user_data_h.read()
user_init = ec2.UserData.custom(user_data)
The render magically gets rid of the lines attribute in the stack and the base64 re-encodes it as appropriate
launch_template_data = {
"UserData": core.Fn.base64(user_init.render())
}
launch_template = ec2.CfnLaunchTemplate(self, "LaunchTemplate", launch_template_name="UmccriseBatchComputeLaunchTemplateDev")
Adding in userdata in the previous step under the kwarg launch_template_data doesn't seem to work so we override the property using the add_property_override instead
launch_template.add_property_override("LaunchTemplateData", launch_template_data)
Our launch template should look like this after running cdk synth
LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
UserData:
Fn::Base64: >-
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
echo Hello
--==MYBOUNDARY==--
@reisingerf Wrote:
Thanks for the suggestion!
However, I'm afraid I don't understand. If I run your user_data code I get:Traceback (most recent call last): File "app.py", line 4, in <module> from stacks.batch import BatchStack File "/Users/freisinger/Devel/projects/github/UMCCR/infrastructure/cdk/apps/umccrise/stacks/batch.py", line 22 user_data = `MIME-Version: 1.0 ^ SyntaxError: invalid syntax Subprocess exited with error 1
Sorry, I got mixed up with typescript/python multiline declarations :)
@reisingerf Wrote:
I did use triple quotes for my multiline user data string, but I still end up with extra empty lines.
We also use an array of strings with
core.Fn.join, with the same result of added empty lines.
See GitHub link above for a flavour of our attempts...
I'm still pretty convinced its the multiline declaration problem. Here is a working (validated) snippet, based on the code @brainstorm posted.
const vpc = ec2.Vpc.fromLookup(this, 'Vpc', {
isDefault: true,
});
const batch_service_role = new iam.Role(this, 'BatchServiceRole', {
assumedBy: new iam.ServicePrincipal('batch.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSBatchServiceRole'),
],
});
const launch_template = new ec2.CfnLaunchTemplate(this, 'LaunchTemplate', {
launchTemplateData: {
userData: cdk.Fn.base64(`MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
echo Hello
--==MYBOUNDARY==--`),
},
launchTemplateName: 'UmccriseBatchComputeLaunchTemplate',
});
const batch_instance_role = new iam.Role(this, 'BatchInstanceRole', {
roleName: 'UmccriseBatchInstanceRole',
assumedBy: new iam.CompositePrincipal(
new iam.ServicePrincipal('ec2.amazonaws.com'),
new iam.ServicePrincipal('ecs.amazonaws.com'),
),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2RoleforSSM'),
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2ContainerServiceforEC2Role')
],
});
const batch_instance_profile = new iam.CfnInstanceProfile(this, 'BatchInstanceProfile', {
instanceProfileName: 'UmccriseBatchInstanceProfile',
roles: [batch_instance_role.roleName],
});
const spotfleet_role = new iam.Role(this, 'AmazonEC2SpotFleetRole', {
assumedBy: new iam.ServicePrincipal('spotfleet.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2SpotFleetTaggingRole'),
],
});
new batch.CfnComputeEnvironment(this, "Env", {
type: "MANAGED",
serviceRole: batch_service_role.roleArn,
computeResources: {
type: 'SPOT',
maxvCpus: 128,
minvCpus: 0,
launchTemplate: {
launchTemplateId: launch_template.ref,
version: launch_template.attrLatestVersionNumber,
},
instanceRole: batch_instance_profile.instanceProfileName!,
instanceTypes: ['optimal'],
subnets: [vpc.publicSubnets[0].subnetId],
spotIamFleetRole: spotfleet_role.roleArn,
}
})
}
The missing part from my earlier suggestion is the indentation, notice:
const userData = `MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
echo Hello
--==MYBOUNDARY==--`
The python counterpart of using triple quotes should work the same.
Regarding your previous attempts you mentioned, here is my theory on why they didn't work:
https://github.com/umccr/infrastructure/blob/495404bbfb0b3b6bf50d9640b2bc012f851c3600/cdk/apps/umccrise/stacks/batch.py#L21 contains a \n in the beginning.
https://github.com/umccr/infrastructure/blob/495404bbfb0b3b6bf50d9640b2bc012f851c3600/cdk/apps/umccrise/stacks/batch.py#L21 is actually missing some blank lines in the middle (not 100% percent sure where exactly)
https://github.com/umccr/infrastructure/blob/495404bbfb0b3b6bf50d9640b2bc012f851c3600/cdk/apps/umccrise/stacks/batch.py#L10 - This should actually work! I don't see exactly how you used it in the commented code...Was it just core.Fn.base64(user_data_script)?
@iliapolo thanks for looking into this!
I started from scratch and using your example I finally got the user data deployed. I don't know why the previous attempts were unsuccessful (partially due to the empty first line for sure, but we've tried so much this couldn't have been the only reason).
I also noticed (after quite some frustration) that my updates to my user data, seemed to be deployed, but actually did not change the user data run by the starting instances. Looking at the AWS console, I noticed that for each change to the LaunchTemplate a new version was created, but the instances always used the default version, which was the first and oldest one.
Any idea how I would go about using the latest version by default?
Actually it seems there are two LaunchTemplates created. The one I defined in my CDK stack and another one that by the looks of it seems to be a combination of mine and some other one (setting some ECS variables).
I am not sure which one is used when the batch instances are booted, but it does not pick up any changes to my template.
@reisingerf Notice that the example above uses:
launchTemplate: {
launchTemplateId: launch_template.ref,
version: launch_template.attrLatestVersionNumber,
},
Which uses the current latest version (granted the name is a bit confusing).
So when you first deployed, it set the launch template version of the compute environment to a fixed number.
Since compute environments don't support modifying this, I imagine CloudFormation doesn't send the update request (though it would have been nice to get error in this case). Therefore your compute environment will always use the initial version.
To fix, you can use:
launchTemplate: {
launchTemplateId: launch_template.ref,
version: "$Latest", // Notice that $Default is also supported to use the default launch template version.
},
From the aws console:

Regarding the two templates, yes, this is the expected behavior.
One template is the one you created and are maintaining. The other one is created by batch and indeed adds some variables, this is btw the reason your user data has to a MIME multi-part archive. Eventually the launch template that is used is the one batch created, but it should always contain your configuration as well, so you don't have to worry about it.
Let me know if this resolved the issue.
@iliapolo thanks a lot for the explanations! Much appreciated!
I figured that the Batch internal use of UserData would enforce the _MIME multi-part archive_ UserData format. I guess that's what caused me some headache in the beginning when I was trying to get a simple Hello world bash user data script to work.
As for the LaunchTemplate version. I found the relevant paragraph in the docs:
AWS Batch does not support updating a compute environment with a new launch template version. If you update your launch template, you must create a new compute environment with the new template for the changes to take effect
Also, you said the second LaunchTemplate is created by Batch (I assume at ComputeEnvironment creation time) and incorporates my own LaunchTemplate. I guess that's the reason why Batch does not support template updates.
If so, then I don't quite understand your suggested fix for it though. If I specify a version number (or $Latest), I would still have to recreate the ComputeEnv, as it would incorporate that version into its own LaunchTemplate on creation time (as static copy), right?
Or are you saying that if I specify$Latest as version in the CfnComputeEnvironment and that version changes, CDK will detect the change and automatically recreate the ComputeEnvironment?
And how would $Latest then differ from launch_template.attrLatestVersionNumber?
@reisingerf You are right.
I mistakenly assumed that $Latest is used as a pointer, and batch does whatever changes needed to its own launch template.
I now understand this is not the case and actually $Latest indeed does not differ from launch_template.attrLatestVersionNumber.
I even tried updating the managed template myself, in the hopes that $Latest perhaps refers to its own managed latest, but that didn't work either.
You mentioned that:
I also noticed (after quite some frustration) that my updates to my user data, seemed to be deployed, but actually did not change the user data run by the starting instances.
If you update the launch template from the CDK app, it should have also caused a re-creation of the compute environment because launch_template.attrLatestVersionNumber now evaluates to a different value, and according to this, CloudFormation would replace the compute environment and the changes should apply.
Can you double check the compute environment was indeed replaced? Note that if the environment is used by some queue (which it usually is), the replacement will fail and actually result in two environments, one pointing to the old template version, and one to the new, with the queue still connected to the old one.
It looks like the experience of updating a launch template isn't tight enough and has a few problems, both from the CloudFormation and the CDK side. I'll try to think how can we improve on that (a feature request from you will be appreciated as well :)).
In the meanwhile, the safest and most streamlined approach (all be it somewhat slow), would be to create a batch.CfnJobQueue in the CDK app and run cdk destroy && cdk deploy each time you change the launch template.
@iliapolo thanks again and understood.
My experiences match what you say. I had the issue with ending up with two compute envs and resorted to exactly your suggestion of destroy and deploy.
Any preferred way of creating that feature request? Shall I open a new ticket on this repo or try to request changes to CloudFormation via a support request?
@reisingerf good question. Since the root cause is actually batch not supporting launch template version updates, I think the best approach would be to submit a feature request in the AWS Batch Developer Forum.
In addition, you can create a general issue in this repo where we can discuss possible approaches for the CDK to help mitigate that.
Once you do that, can you please link those issues from here and close this one?
Thanks!
Done.
Forum link: https://forums.aws.amazon.com/thread.jspa?threadID=318580
Most helpful comment
After much playing around I've managed to see a workaround for a python deployment.
Reading in the user data from a file is better than using a multi-line string.
Part 1: Read in the user data
Part 2: Assign as a Userdata object with the custom method
Part 3: Add to launch_template_data dict
The render magically gets rid of the
linesattribute in the stack and the base64 re-encodes it as appropriatePart 4: Initialise the launch template
Part 5: Override the launch template property
Adding in userdata in the previous step under the kwarg
launch_template_datadoesn't seem to work so we override the property using theadd_property_overrideinsteadPart 6: Validate
Our launch template should look like this after running
cdk synth