Aws-cdk: Better way to patch resources at the CloudFormation level to workaround gaps (escape hatch)

Created on 20 Aug 2018 · 37Comments · Source: aws/aws-cdk

Currently, the only way to workaround gaps in the AWS Construct Library is to overload toCloudFormation at the Stack level and patch the synthesized output:

class MyStack extends cdk.Stack {
    // ...
    toCloudFormation() {
      const cf = super.toCloudFormation();
      cf.Resources.Pg4352TopicPg4352QueueSubscriptionDD55615C.Properties.Protocol = 'bla';
      return cf;
    }
}

Requirements

Allow override (add/remove/modify) of the L1 properties of resources created by L2 constructs.
Allow accessing to the strongly-typed surface area of L1 (not just raw key/values)
Allow modifying resource metadata and other L1 resource options such as DependsOn (e.g. #777)
Allow modifying L1 properties that do not have strong-type representation (i.e. properties that are not defined in the CFN spec yet)
Allow adding resources that are not in the CFN spec (already supporting by directly instantiating cdk.Resource, but we should document)
Allow changing individual values inside a complex L1 property type
It should be possible to explicitly _delete_ values.
It should be possible to set a value to an empty object, array or null as those are sometimes needed by CloudFormation (e.g)

Non functional:

It should be possible to specify overrides "close" to where the L2 is defined.
Overrides should be applied on top of anything L2 does, such as mutations, validations, etc (which can happen at any time before synthesis, and after validation) - this means that overrides should be applied during synthesis of the resource and not before.
Overrides should not require that the L2 will "support" them, so it will truly be an "escape hatch" for people to be able to work around missing capabilities or APIs
Raw overrides should also be able to bypass any L1 validations, in case the CFN spec is wrong or not up-to-date
It should be possible to assign tokens (and stringified tokens) as override values.
Record override info in our analytics system (AWS::CDK::Metadata resource) to help prioritize L2 work and identify gaps (#785)

Design

Based on the requirements above, here's a design proposal.

Accessing L1 resources

To get a handle on an L1 resource behind an L2, users will use the construct.findChild(id) or tryFindChild(id) and downcast it to the specific L1.

For example, to access the L1 bucket resource given an L2 bucket:

const bucketResource = bucket.findChild('Resource') as s3.cloudformation.BucketResource;

By convention, the ID of the "main" resource of L2s is Resource, so in most cases, it should be straightforward to find the child.

In more complex scenarios, users will have to consult the L2 code in order to learn the ID (for example, asg.findChild('LaunchConfig') will return autoScaling.cloudformation.LaunchConfigurationResource.

Perhaps we can improve our convention such that the IDs of the children that are not the main resource will be 1:1 with the CloudFormation resource type name, so instead of LaunchConfig we should use LaunchConfiguration.

This approach allows users to access L1 resources freely, and obtain a strongly-typed L1 type from them without requiring that the L2 layer will "expose" those resources explicitly.

findChild will fail if the child could not be located, which means that if the underlying L2 changes the IDs or structure for some reason, synthesis will fail.

It is also possible to use construct.children for more advanced queries. For example, we can look for a child that has a certain CloudFormation resource type:

const bucketResource = 
    bucket.children.find(c => (c as cdk.Resource).resourceType === 'AWS::S3::Bucket') 
    as s3.cloudformation.BucketResource;

From that point, users are interacting with the strong-typed L1 resources (which extend cdk.Resource), so we will look into how to extend their surface area to support the various requirements.

Resource Options

cdk.Resource already has a few facilities for setting resource options such as Metadata, DependsOn, etc:

const bucketResource = bucket.findChild('Resource') as s3.cloudformation.BucketResource;

bucketResource.options.metadata = { MetadataKey: 'MetadataValue' };
bucketResource.options.updatePolicy = {
    autoScalingRollingUpdate: {
        pauseTime: '390'
    }
};

bucketResource.addDependency(otherBucket.findChild('Resource') as cdk.Resource);

This will synthesize to:

{
    "Type": "AWS::S3::Bucket",
    "DependsOn": [ "Other34654A52" ],
    "UpdatePolicy": {
        "AutoScalingRollingUpdate": {
            "PauseTime": "390"
        }
    },
    "Metadata": {
        "MetadataKey": "MetadataValue"
    }
}

Raw Overrides

At the lowest level, we want to allow users to "patch" the synthesized resource definition and bypass any validations at the L1/L2 layers.

To that end, we will add a method resource.addOverride(path, value) which will allow applying arbitrary overrides to a synthesized resource. addOverride will be able to override anything under the resource definition (including resource options such as DependsOn, etc). We will also add resource.addPropertyOverride(propertyPath, value) as sugar for property overrides.

For example:

bucketResource.addOverride('Transform', 'Boom');
bucketResource.addPropertyOverride('VersioningConfiguration.Status', 'NewStatus');

Will synthesize to:

{
    "Type": "AWS::S3::Bucket",
    "Properties": {
        "VersioningConfiguration": {
            "Status": "NewStatus"
        }
    },
    "Transform": "Boom"
}

It should be possible to also assign null as an override value:

bucketResource.addPropertyOverride('Nullify.Me', null);

Synthesizes to:

{
    "Type": "AWS::S3::Bucket",
    "Properties": {
        "Nullify": {
            "Me": null
        }
    }
}

It should also possible to clear a value using an override:

bucketResource.addDeletionOverride('Delete.Me');

Overrides may freely use tokens (or stringified tokens) for values, and those will be resolved during synthesis:

bucketResource.addPropertyOverride('OtherBucketArn', otherBucket.bucketArn);

Will synthesize to:

{
    "Type": "AWS::S3::Bucket",
    "Properties": {
        "OtherBucketArn": {
            "Fn::GetAtt": [
                "Other34654A52",
                "Arn"
            ]
        }
    }
}

Strong-type property overrides

Users should also be able to define overrides for L1 resource properties via the generated property types. To that end, each generated resource will expose another property propertyOverrides of type XxxProps.

So, s3.cloudformation.BucketResource#propertyOverrides will have the type s3.cloudformation.BucketResourceProps, which allow users to define overrides for bucket resource properties as follows:

bucketResource.propertyOverrides.corsConfiguration = {
    corsRules: [
        {
            allowedMethods: [ 'GET' ],
            allowedOrigins: [ '*' ]
        }
    ]
};

Notes:

propertyOverrides are _merged_ into the resource on top of the values defined during initialization.
This means that in order to __delete__ a value, you will have to set it to null in the overrides.
Overrides will not go through L1 validation.

Defining raw resources

It is also possible to directly use cdk.Resource to define arbitrary CloudFormation resources:

new cdk.Resource(this, 'MyResource', {
  type: 'AWS::Unknown::Resource',
  properties: {
    Foo: 'bar'
  }
});

Source

rix0rrr

All 37 comments

@rix0rrr proposed:

Right now, the App crawls the entire construct tree, calls toCloudFormation where it exists and aggregates the result.

This lack of implementation hiding gives users no ability to take existing constructs but tweak them slightly, as the only CloudFormation generation happens at the very lowest level. As came up in #603, since every L2 is already wrapping another resource, in this very simplest construct tree:

  |
  +--- AWS::ELB::LoadBalancer

Let's say a user wanted to take the output of the current ClassicLoadBalancer and mostly keep it but add an attribute.

In a normal OO system, you would subclass ClassicLoadBalancer, override toCloudFormation and change the parent's output slightly before returning it.

In our system, you can't do this since ClassicLoadBalancer is not involved in the synthesis process at all, and consumers have no control over the instance of AWS::ELB::LoadBalancer that is constructed INSIDE the ClassicLoadBalancer.

I would argue it would be beneficial if we could achieve the former, since it conforms to expectations and it also provides a more usable "escape hatch" mechanism (that customers are asking for) than simply overriding Stack.toCloudFormation().

A simple way of achieving this would be to have App.toCloudFormation() stop recursing into children as soon as it finds a construct that implements toCloudFormation(), and leave it up to the highest-level resource in the tree to recurse further and synthesize all of its children (optionally twiddling the output). This makes it so that implementing ClassicLoadBalancer.toCloudFormation() actually has the opportunity to collect all of its children, modify the output, then return that.

@eladb, opinions?

eladb on 28 Aug 2018

I thiiiink this might be part of the appmodel refactor?

rix0rrr on 29 Aug 2018

@rix0rrr not sure I see how this is related to the app model. It's mostly about devising decent ergonomics to patch synthesized resource properties to some arbitrary values, no?

eladb on 29 Aug 2018

The synthesis implementation we use right now looks somewhat like this:

class Stack extends Construct {
    public toCloudFormation() {
        walk(this.children, c => {
            if (c instanceof StackElement) {
                return c.toCloudFormation();
            }
        });
    }
}

I.e., there's a single class that walks the entire tree, and for every child checks if it looks like something that can produce CloudFormation and if so calls it.

The fact that a single piece of code walks the entire tree and calls the producers, without giving intervening classes the ability to intercept this, is exactly what's making the "escape hatch" problem hard, and is making it so that people have to override Stack.toCloudFormation().

What I'm arguing is that the synthesis implementation should always have been like this:

class Construct {
    public toCloudFormation() {
        return merge(this.children.map(c => c.toCloudFormation());
    }
}

So that every Construct in the tree has the ability to modify the returned values from their children, and we have a natural escape hatch mechanism.

We couldn't do this before, because we wanted Constructs to be more general than CloudFormation, but we COULD do this now by abstracting away from CloudFormation, and calling it toAppModel:

class Construct {
    public toAppModel() {
        return merge(this.children.map(c => c.toAppModel());
    }
}

rix0rrr on 29 Aug 2018

For example, I think CloudFormation resources can emit the following:

const SomeResource {
     public toAppModel() {
        return {
            CloudFormationResource {
                [this:logicalId]: {
                    Type: 'AWS::IAM::Oei'
                }
            }
        };
    }
}

And then Stack can collect up all keys under CloudFormationResource and emit a CloudFormationStack. Anything that's not recognized at the end is an error.

rix0rrr on 30 Aug 2018

Some thoughts:

As I user, I want to use the L1 constructs for my "escape hatch", not manipulate the generated CF template. L1 constructs have an API similar to the L2 and "feel" more natural to work with having learned the L2 API. Other factors:
- I have a need to set dependencies between L1 and L2 constructs; while I can see how this would be possible at "conversion" time, it feels more natural to do at the construct level.
- I don't want to have to create a derived class just to set an unsupported property or specify an extra dependency; the extra boilerplate will quickly get tiresome.
- But I would still prefer to specify my modifications in the context where the construct being modified is declared, rather than having them in a separate "conversion" method.

ccurrie-amzn on 20 Sep 2018

👍1

Moved to issue description

eladb on 26 Sep 2018

I would add this (might be included in your head but I'd like to see it spelled out):

Allow changing individual values inside a complex L1 property type.

Allow adding resources that are not in the CFN spec

Might be true, but this doesn't need a change, right? People are free to instantiate cdk.Resource.

rix0rrr on 26 Sep 2018

There may be questions about the interactions between overrides and runtime validations. I reckon overrides should happen strictly after synth-time validations, too (because the lists of valid values & the likes may be outdated, too).

RomainMuller on 26 Sep 2018

@rix0rrr updated based on your inputs.
@RomainMuller this is what I meant by "Overrides should be applied on top of anything L2 does". I will clarify

eladb on 26 Sep 2018

Well - some of the validation happens at the L1 level...

RomainMuller on 26 Sep 2018

@RomainMuller good point. So you are suggesting that any "patches" should also bypass any L1 validations, in case the spec is wrong or not up-to-date

eladb on 26 Sep 2018

Moved to issue description

eladb on 26 Sep 2018

LGTM!

rix0rrr on 26 Sep 2018

Does it make sense to expose the full list of children (in addition to findChild(id)) so that customers don't have to "know" the resource id conventions? Then if they wanted a certain type they could do something like:

construct.children.filter( it => it instanceof s3.cloudformation.BucketResource);

That way I don't risk the ID changing out from under me as you hint that it's something that might change.

If we think "get by type" is going to be a particularly common use car you it may also be useful to add getFirstChildOfType<T>(...):T

kiiadi on 26 Sep 2018

A caveat I see as well is that if the "primary" resource of an L2 changes (aka used to be type Foo and now is Bar), the down-cast will still compile (because you're forcing the type checker's hand anyway), but you might get surprising results (hopefully, that don't synthesize or deploy, but if you're merely changing resource options, then it will happily proceed).

Also, are we positive the find & down-cast solution can be used in all the jsii front-end languages? In particular, I worry that we may be occasionally return proxy-objects for an interface reference to Java, and that the forced cast will throw ClassCastException.

RomainMuller on 26 Sep 2018

@kiiadi good point, this is actually possible (instanceof is usually not a good practice in JavaScript, but one can use resourceType):

bucket.children.find(c => (c as cdk.Resource).resourceType === 'AWS::S3::Bucket') 
    as s3.cloudformation.BucketResource;

@RomainMuller down-casting is a basic object-oriented capability and I want to believe it's going to be supported by all the languages we bind to. In strongly-typed languages like Java, if the actual resource object has a different type, the down cast will fail at runtime. In JavaScript, we might get some weird behavior, for sure. Can you elaborate on your concern with interface proxies? Example?

eladb on 26 Sep 2018

Names:

| 1 | addOverride(p, v) | addPropertyOverride(pp, v) | propertyOverrides.X
|---|---------------|-------------------------|------------------------------------
| 2 | override(p,v) | overrideProperty(p,v) | propertyOverrides.X
| 3 | patch(p,v) | patchProperty(pp,v) | overrides.X |
| 4 | | setProperty(pp,v) | bucketOverrides.X |

Obvsiouly any combination thereof is also an option.

eladb on 26 Sep 2018

@eladb - Regarding the interface proxies, I reckon it's never an issue when dealing with Resource classes (since they are... classes, y'know!).

Regarding the name proposals... To me, patch implies you're going to provide it in the for of add, remove and replace operations, and your examples seem to imply you're not going to get them this way... So I would vote against patch specifically here.

The setProperty option also looks wrong to me, because it looks way too "normal" (and you probably want those overrides to not look like normal, instead to look like "here be dragons 🐲" / "I know what I'm doing here").

RomainMuller on 26 Sep 2018

Ha, let me be contrarian. I like patch because it's very clear what you're doing: fixing up some deficiency.

Also, I generally prefer shorter words to longer words for verbs in programming, therefore patch (1 syllable) is better than override (3 syllables).

rix0rrr on 26 Sep 2018

This means that in order to delete a value, you will have to set it to null in the overrides.

Null is a valid and reasonable value for some properties; I've run into issue trying to replicate what the console does in CDK and been blocked from using null where it appeared in my console test output. I would instead like to see an explicit property removal API.

ccurrie-amzn on 26 Sep 2018

Good point about property removal...

Regardless, curious about your use case for using null in CloudFormation property values. Can you point me to an example?

eladb on 26 Sep 2018

It occurred in APIGateway, when I used the console to enable CORS. When the result was examined using aws apigateway get-method, methodIntegration.integrationResponses had

responseTemplates: {
    "application/json": null
}

You can see this in the documentation as well: https://docs.aws.amazon.com/cli/latest/reference/apigateway/get-method.html

I'm not able to replicate this in CDK, because apigateway.generated.d.ts defines IntegrationResponseProperty.responseTemplates as containing non-nullable values.

ccurrie-amzn on 26 Sep 2018

@ccurrie-amzn updated requirements and design (now moved to issue description) to reflect the requirement to allow nulls and deletion of values. Thoughts?

eladb on 26 Sep 2018

I think it looks good enough for getting started. Hands on experience will be more instructive at this point. 😄

ccurrie-amzn on 26 Sep 2018

I agree with @ccurrie-amzn, on paper my pain points look covered by the content in the description.

moofish32 on 26 Sep 2018

Thanks for your inputs guys! Looking forward for your feedback from actual use soon!

eladb on 26 Sep 2018

About when to apply L1 schema validation...

We are going to support two layers of overrides:

Strongly-typed property structure, e.g.

bucketResource.propertyOverrides.corsConfiguration = {
    corsRules: [
        {
            allowedMethods: [ 'GET' ],
            allowedOrigins: [ '*' ]
        }
    ]
};

Weakly-typed "patches", e.g.

bucketResource.addPropertyOverride('VersioningConfiguration.Status', 'NewStatus');

The second option will allow _anything_ to be assigned, and will bypass all L1 schema validations. The question is: do you think it makes sense for the first type of overrides to go through schema validations or not.

Initially we were saying that it should bypass schema validation and allow anything to be assigned, but when diving a little deeper, some of the schema validation actually happens at compile time (e.g. requires/optional props), so does it make sense to opt-out of some of the validation in this usage?

For the cases where the CFN schema is wrong (and thus requires an override), users could just go to the 2nd option (addPropertyOverride) and opt-out of any type/schema checking.

eladb on 26 Sep 2018

Personally I like validation where possible. I think the default should be to validate the typed overrides, but I can see having the option to turn it off if you need to. You might consider tracking how often opt-outs happen in your usage metadata.

ccurrie-amzn on 26 Sep 2018

👍1

@ccurrie-amzn great ideas!

Tracking overrides via CDK::Metadata might be a great way to prioritize our L2 work.
Turning off validation at the resource level 👍 - we can do it regardless of overrides (another escape hatch mechanism)... Adding to requirements...

eladb on 26 Sep 2018

I like the idea of tracking overrides in Metadata to target gaps. I probably lean more towards validation always for the typed properties and if that fails users have to go to option 2. We hopefully need very few use cases of option 2.

moofish32 on 26 Sep 2018

@ccurrie-amzn see #782 regarding "turning off" validation. TL;DR, it's non-trivial, so I am punting until we have evidence that it is actually needed/valuable.

eladb on 26 Sep 2018

👍1

Are these escape hatches officially/nicely documented anywhere on the main docs site? I stumbled across this issue by accident and wasn't aware of it prior, but it's a super useful feature that should be easily discoverable!