Aws-cdk: [aws-lambda-nodejs] Bundling performance issues

Created on 10 Sep 2020  ·  27Comments  ·  Source: aws/aws-cdk

:question: General Issue

The Question

In my stack I've created several lamdas using the new NodejsFunction from @aws-cdk/aws-lambda-nodejs. Functionally everything works fine, code in bundled with Parcel and the stack deploys without issues. There are however two (Parcel related?) performance issues:

  1. Bundling seems quite slow, my lambda has a few dependencies and it takes about 20 seconds to bundle it.
  2. When performing multiple tests for the same stack, tests are slow as the lambda bundling isn't cached.

I'm wondering if this is the expected performance or if I'm missing something? I did some tests with webpack to manually bundle the Lambdas, this is about 3 times faster than webpack. I'm also wondering about the caching feature in Parcel, shouldn't the second performance issue be tackled it?

How to reproduce

I've created a simple sample app to show the issue: https://github.com/jaapvanblaaderen/cdk-lambda-bundling-performance

The app contains a single stack with one lambda that's referenced in two tests. In the test output you can observe that bundling an empty lambda function takes about 8 seconds (initially slow) and for the second test it's still around 6-7 seconds (no/bad caching?).

Bundling asset simple-stack/HandlerLambda/Code/Stage...

 RUNS  lib/simple-stack.test.ts
✨ Built in 8.01s
Bundling asset simple-stack/HandlerLambda/Code/Stage...

 RUNS  lib/simple-stack.test.ts
✨ Built in 6.66s
 PASS  lib/simple-stack.test.ts (21.056 s)
  Simple Stack
    ✓ Should have memory size set to 512MB (9505 ms)
    ✓ Should use Node 12 as runtime (7459 ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        22.211 s
Ran all test suites.

Environment

  • CDK CLI Version: 1.62
  • Module Version: 1.62
  • Node.js Version: v14.4.0
  • OS: OSX Catalina
  • Language (Version): TypeScript 4.0.2
@aws-cdaws-lambda-nodejs bug efformedium in-progress p1

Most helpful comment

@floydspace we are going to replace parcel with esbuild while still maintaining local bundling + fallback to Docker approach for the moment. People installing esbuild will have very fast bundling performances.

Let me first draft a PR for this then we can work on it together.

All 27 comments

@jaapvanblaaderen currently investigating this... looks indeed like there's an issue with the cache.

@jogold Ok cool, just let me know if I can help testing something if needed.

So... it's complicated and has to do with the content of cdk.out changing (we are bundling in there in a temporary directory) which invalidates part of the cache...

@jaapvanblaaderen we'll see how #10763 improves the cache performance. Let me know your experience when it's merged and released.

Thanks @jogold, I'll let you know.

@jogold I've done some testing with the latest 1.68 release, which should include the changes in https://github.com/aws/aws-cdk/pull/10763 if I'm correct. I've tested with the exact same test app as mentioned in the issue description and took the average of four test runs (running npm run test).

Results with parcel 2.0.0-beta.1

  • CDK 1.62: 22.3 seconds
  • CDK 1.68: 31.7 seconds

Not very promising results as it apparently got 10 seconds slower. I tested a few other versions as well, 1.62-1.66 are similar in performance and from 1.67 and upwards it's around 10 seconds slower.

Now the interesting part... I noticed that there are quite some security vulnerabilities with the currently supported version of Parcel, not really a big issue, but decided to upgrade it using npm audit fix. This upgrades parcel to ^2.0.0-nightly.425. Before I could re-test, I had to tweak some stuff in https://github.com/aws/aws-cdk/blob/master/packages/@aws-cdk/aws-lambda-nodejs/lib/bundlers.ts to make the build work: Change the PARCEL_VERSION and remove the --no-autoinstall option.

Results with parcel 2.0.0-nightly.425

  • CDK 1.62: 5.4 seconds 🎉
  • CDK 1.68: 22.7 seconds

That 5.4 seconds looks promising 🙂 . I hope this info helps to improve the situation? I guess you can compare CDK 1.66 and 1.67 to see why the performance (from CDK perspective) got worse? Then there is Parcel, I find it a bit weird that the latest 2.x release is quite old especially as they are still in beta and apparently making quite good (performance) improvements.

Also note that there is already a (dependabot PR) for updating parcel in CDK: https://github.com/aws/aws-cdk/pull/10953

You can solve the slow tests problem by mocking out Parcel bundling with this module - https://www.npmjs.com/package/cdk-lambda-nodejs-mock

You can solve the slow tests problem by mocking out Parcel bundling with this module - https://www.npmjs.com/package/cdk-lambda-nodejs-mock

@alan-cooney This indeed works great for running tests, thanks! 👍 Is there also an open GitHub repo linked to this module?

Hey @jaapvanblaaderen . I have implemented a new cdk construct aws-lambda-nodejs-esbuild

See comparison results on my machine:
@aws-cdk/aws-lambda-nodejs:

Bundling asset simple-stack/HandlerLambda/Code/Stage...

 RUNS  lib/simple-stack.test.ts
✨ Built in 11.88s

 PASS  lib/simple-stack.test.ts (84.678 s)s/_qq218h940s33jg73wss4k5w0000gn/T/cdk.outeeruBW/bundling-temp-dc2901b128dbc7ef10d25ae6a7a15a98d5e3ac81c654c3ad726474eee5ec  Simple Stack
    ✓ Should have memory size set to 512MB (13909 ms)
    ✓ Should use Node 12 as runtime (67285 ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        86.375 s, estimated 133 s
Ran all test suites.

vs
aws-lambda-nodejs-esbuild:

 PASS  lib/simple-stack.test.ts
  Simple Stack
    ✓ Should have memory size set to 512MB (127 ms)
    ✓ Should use Node 12 as runtime (50 ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        2.045 s, estimated 4 s
Ran all test suites.

I forked your sample, so you can see how to migrate

Just saying.

I have implemented a new cdk construct aws-lambda-nodejs-esbuild

Very interesting!!

@floydspace This is great work! I will definitely have a look at this.

The benchmarks look really promising regarding the https://github.com/floydspace/aws-lambda-nodejs-esbuild package. As a regular user of AWS CDK + NodejsFunction constructs, I vote strongly on switching to the esbuild package internally, instead of the current Parcel implementation.

A side note: we've had good results with using Webpack back in the days before the NodejsFunction construct existed. Are there any plans for evaluating that one as a possible technology to back the construct?

@eladb what's your opinion here?

I've made some tests with esbuild and it's incredibly fast. This looks like the perfect tool for aws-lambda-nodejs and would solve some of the tweaking we do to make Parcel work (like playing with the package.json).

If we go in this direction would you agree to include esbuild as a real (bundled) dependency in aws-lambda-nodejs? It has no dependencies, it only installs its binary with a postinstall script. Moreover, it offers a synchronous Javascript API (with TS typings). This means that we could completely drop the Docker + tryBundle() implementation. (Docker could still be used when installing packages that are not bundled though)

@floydspace would you be ready to contribute for this?

Hey @jogold , sure would love to.
I have a couple of questions tho:

  • Can I just post the code from my package as is as a first version?
  • Why to use docker when installing not bundled packages, we can just use the user's environment. To be honest, I use the forked and adapted for cdk implementation from serverless-webpack/lib/packExternalModules.js and it's covered and works perfectly, (though I need to adapt tests as well)
  • Why to use docker if we could just spawn parcel cli as a child process to make it synchronous? It's basically what the esbuild synchronous API does. Maybe I miss some important detail in how cdk ecosystem works

Thank you.

Can I just post the code from my package as is as a first version?

Let's wait for @eladb's feedback

Why to use docker when installing not bundled packages, we can just use the user's environment. To be honest, I use the forked and adapted for cdk implementation from serverless-webpack/lib/packExternalModules.js and it's covered and works perfectly, (though I need to adapt tests as well)

For modules using native dependencies you could want to install in a Lambda compatible environment. This could be an option of course.

Why to use docker if we could just spawn parcel cli as a child process to make it synchronous? It's basically what the esbuild synchronous API does. Maybe I miss some important detail in how cdk ecosystem works

We started to use docker because we didn't want to include parcel as a direct dependency in aws-lambda-nodejs. It has too many dependencies. See https://github.com/aws/aws-cdk/issues/6340.

Sounds promising but I would still be reluctant to bundle it with the CDK. Going towards 2.0 the CDK will be a single module, which means our bar for bundling deps is going to be very high.

Can we still employ the same docker fallback approach here? What are the downsides?

Seems like a reasonable approach. To let the developers opt for Docker based or esbuild based compilation. However, let's consider these facts:

  1. 99% of users of the NodejsFunction can use Docker and Parcel based bundling currently interchangeable. The only exception are those that require bundling in a Lambda compatible Layer so that some sort of system dependencies can be embedded. Honestly, I don't have a practical example of this 1% but we should assume there are such use cases.
  2. Docker-based bundling is inherently slower compared to either Parcel or esbuild, due to the fact that there is the Docker overhead. This is especially true in MacOS based environments. Like 10x slower. Also, by using Docker-based bundling, Docker is introduced as a dependency of the project, instead of the standard Node + NPM, which CDK already requires anyway.

Based on these facts, I consider the "meaningful default" to be an option that is FASTER in 99% of the use cases and requires fewer system dependencies (Docker). This option currently is either Parcel or esbuild.

Long story short, it's about finding the right balance of keeping package dependencies low, and at the same time - keeping the development experience of consumers of the NodejsFunction construct satisfactory.

Just curious: If the NodejsFunction construct declares Parcel or esbuild as a dependency on a fixed version, isn't it enough to mitigate risks for AWS CDK?

Can we still employ the same docker fallback approach here? What are the downsides?

Yes but one of the problems we have with Docker here is that we need to mount the project root (in order to have access to the top level node_modules folder, this is a different situation compared to aws-lambda-python for example). This results in poor performances (almost unusable on macOS). And since installing parcel locally greatly improves performances without any constraints I think everyone is (or should be) using local bundling. And then the default becomes the exception....

Implementing a local only solution would result in much cleaner code also. We could maybe go back to having an "undeclared peer dependency".

Note: local bundling could still use Docker (as an option) when installing unbundled dependencies that should be built in a Lambda compatible environment.

Why exactly do we need access to the top-level node_modules? I normally have my own package.json in the lambda subfolder and I think bundling should work in this folder without the CDK project, shouldn't it?

That's a "classical" setup yes, but not a necessary one. In our organization, instead, we create Lambdas alongside/next-to the CDK constructs. So there's a single package.json and it's at the root of the CDK app. This makes it easy to centralize third party version upgrades of dozens of Lambdas.

I normally have my own package.json in the lambda subfolder

How do you easily install such a setup? Nesting package.json is usually an anti-pattern. There's no reason not to declare all your projects dependencies in one place (along with tooling configurations etc.), especially when bundling. The only exception are monorepos (still there's no nesting) but in this case dependencies are hoisted to the top-level node_modules folder so you need access to it when bundling.

I see that with bundling this might indeed not be necessary. Sometimes I zip the folder as Lambda asset and then I need it contained in a subfolder. Installing is done using the normal npm ci command. I am setting up my project with projen so this is called for all my folders automatically.

How do I specify Lambda-specific deps in one subproject and not in another?

I am open to any best-practices and happy to improve my setup.

@alan-cooney This indeed works great for running tests, thanks! 👍 Is there also an open GitHub repo linked to this module?

Currently it's on my company's monorepo, but I can split it out if useful? It essentially just replaces all NodejsFunction classes with the standard Function from @aws-cdk/aws-lambda, and sets the code as Code.fromInline('return;') rather than bundling the actual lambda.

@hoegertn

Sometimes I zip the folder as Lambda asset and then I need it contained in a subfolder.

Got it, in this case you have no other solution (but you're creating a zip that contains much more than what's needed by your function).

That's exactly the problem we're trying to solve with the NodejsFunction construct (zero-config, bundling of exactly what's needed). Once we solve the performance issue there won't be any reason for nested package.json files anymore I think.

Installing is done using the normal npm ci command. I am setting up my project with projen so this is called for all my folders automatically.

This means that you're sometimes downloading and installing the same package multiple times. It's also complicated when you want to update packages.

I see your arguments. Performance is one issue, packaging additional files is sometimes an issue, and different dependency versions are sometimes a wanted thing.

But I would love to get rid of all this.

@floydspace we are going to replace parcel with esbuild while still maintaining local bundling + fallback to Docker approach for the moment. People installing esbuild will have very fast bundling performances.

Let me first draft a PR for this then we can work on it together.

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

juhofriman picture juhofriman  ·  33Comments

eladb picture eladb  ·  52Comments

clareliguori picture clareliguori  ·  30Comments

thibaut-singlefile picture thibaut-singlefile  ·  27Comments

eladb picture eladb  ·  33Comments