Aws-cdk: [batch] aws_batch.JobDefinition to keep previous revisions Active on update

Created on 17 Jul 2020  路  5Comments  路  Source: aws/aws-cdk

I tried to look for issues/bugs with this, but I was not able to find any. I am not sure if this is a bug fix or just a feature request. Most of the projects that I work on consist of managing AWS Batch Job Definitions and State Machines to orchestrate large batches of docker containers at once. Each revision of a job definition points to a particular tag of an ECR image for backwards compatibility. In my CDK stack, any time there is any edit to my JobDefinition construct, the stack registers a new job definition revision but deregisters the previous revisions. Is there a way to prevent cdk from deregistering the previous revisions? I can implement this is boto3, but I want to try a native cdk approach.

Reproduction Steps

Initial cdk deploy which creates revision 1 pointing to image:0.0.0

self._job_def = JobDefinition(self, 'JobDef', 
     job_definition_name="job-definition-name",
     container=JobDefinitionContainer(
            image=ecr_image,  # using tag 0.0.0
))

Second cdk deploy which creates revision 2 properly pointing to image:0.0.1, but deregisters revision 1

self._job_def = JobDefinition(self, 'JobDef', 
     job_definition_name="job-definition-name",
     container=JobDefinitionContainer(
            image=ecr_image,  # using tag 0.0.1
))

Error Log

No errors, but there is an unwanted deregister step.

Environment

  • CLI Version : 1.48.0 (build 6080fa8)
  • Framework Version: 1.51.0
  • Node.js Version: node - 12.16.1, npm 6.13.4
  • OS : Mac OS Mojave 10.14.6
  • Language (Version): Python 3.8.3

Other

JobDefinition CF

Above is a link to CloudFormation Resource type page for JobDefinition. The only property I could find for update requires Replacement is job definition name. However, the job definition is always the same in my example.


This is :bug: Bug Report

@aws-cdaws-batch efformedium feature-request needs-cfn p1

Most helpful comment

Hi @jabrennem - Thanks for reporting this. Just wanted to reach out and let you know i'll be looking into it this week and share my conclusions.

All 5 comments

I was experimenting with the underlying CfnJobDefinition using node.children and changed the Removal Policy to retain. I wanted to add another discovery that I thought might pertain to this.

cdk deploy

```python
self._job_def = JobDefinition(self, 'JobDef',
job_definition_name="job-definition-name",
container=JobDefinitionContainer(
image=ecr_image, # using tag 0.0.1
))

a little hacky, but it worked

for child in self._job_def.node.children:
if hasattr(child, 'apply_removal_policy') and hasattr(child, 'cfn_resource_type') and child.cfn_resource_type == 'AWS::Batch::JobDefinition': # search for CfnBatchJobDefinition
child.apply_removal_policy(policy=core.RemovalPolicy.RETAIN)

This repeats the same functionality as in the issue. Creates a revision 1 pointing to tag v0.0.1 and deregisters the previous revision. However, after I run
```bash
cdk destroy

The batch job definition is retained. When I run again

cdk deploy

It adds a new revision 2 to the pre-existing JobDefinition pointing to tag 0.0.1, but does not remove the previous revision. I am not sure how cdk works in the backend, but it's almost as if 'updating' uses a different removal policy? I'd love to hear an expert's analysis of it.

Hi @jabrennem - Thanks for reporting this. Just wanted to reach out and let you know i'll be looking into it this week and share my conclusions.

Hi @jabrennem - Sorry for the long delay.

It does seem that the CloudFormation behavior for JobDefinition is to create a new revision for every change that doesn't require a replacement, and delete all other managed revisions. When a JobDefinition is deployed, CloudFormation looks up the latest revision, and creates a new one on top of that, leaving any unmanaged revisions as is.

By managed, I mean those created by CloudFormation as well. This is why when you destroy the stack with RETAIN, what effectively happens is you turn the previously managed revision, to an unamanged one, and is therefore ignored on subsequent deploys. The same thing will happen if you create a revision from the console or any other tool, those revisions will remain when you destroy and deploy the stack.

CloudFormation doesn't expose any configuration pertaining to versions, so i'm afraid there is no way to change this behavior.
CDK itself has no additional backend logic that interacts with the AWS Batch API.

So my suggestion would be as follows:

  1. Create a support ticket on the AWS Forum describing this behavior.
  2. Create a CloudFormation Coverage Issue requesting specific support for JobDefinition revisions. Maybe a new resource: JobDefinitionRevision.
  3. If possible, create a new JobDefinition in CDK for every new image tag.

It would be good if you detail your exact use-case and flow that requires to retain old revisions. I imagine you have jobs running with the old revision definition during deployment of new ones? Can you share the full code?

To note, we have a similar issue: We run batch jobs from step functions that can take a long time prior to starting the batch job. When our continuous deployment deploys a new version of the step function, this step function will in many cases point to a new revision of the batch job definition. However, any currently running step function that hasn't reached the batch job yet will fail because it points to a now inactive job definition revision.

We are going to hack around this by re-activating the last few revisions of the batch job definition after deployment, but that's obviously less than ideal.

Thank you @iliapolo for looking into this.

I cannot share any code, but your description has helped a lot. I will submit a CloudFormation Coverage Issue to hopefully gain more control over the batch job revisions. I like the idea of a JobDefinitionRevision resource.

For right now, creating a separate batch job definition for each new image:tag will work, and at least it is still a native CDK implementation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cybergoof picture cybergoof  路  3Comments

peterdeme picture peterdeme  路  3Comments

eladb picture eladb  路  3Comments

v-do picture v-do  路  3Comments

schof picture schof  路  3Comments