Graphql-code-generator: Incremental/faster builds

Created on 26 Mar 2019  路  8Comments  路  Source: dotansimha/graphql-code-generator

Is your feature request related to a problem? Please describe.

The overarching issue is keeping the generated files up-to-date with respect to the sources while also avoiding issues like diff churn, and one solution is to add the generation step as a postinstall script, but with larger code bases this may be a nuisance in itself that installing packages is slowed down.

Describe the solution you'd like

The generator should have a flag to keep a cache of the source files and only re-build if the source files have changed.

Describe alternatives you've considered

A generic utility (not specific to this tool) to track changes in files.

Another alternative is keeping the generated files in the source, but this is undesirable for a number of well-known reasons.

core enhancement

All 8 comments

I've experimented with an approach that scans the last modified times of all sources, turns them into a string and creates an MD5 digest, caches it in a temporary location, and then compares the cached digest with the current one the next time it's run. Doing this for around 3k source files in node takes about 2s, while the generation takes about 14s, so it's a reasonable win in performance. The generation is also run if the target files are missing.

A question that arises is whether this can be implemented as a plugin.

For reference, ESLint implements a feature like this (albeit more complex) with a --cache flag, and it has a separate flag for configuring the cache location.

May I propose a simpler, timestamp-based approach? Similar to what TypeScript 3.0 does with --build mode. The basic idea is: if all output files are newer than all input files (and no output files are missing), skip the build.

  • Pros: no fiddling with a cache directory; no need to implement incremental state saving (therefore, a lot simpler to implement); faster than checksumming every input file
  • Cons: not actually incremental (i.e. when anything is touched, builds don't get any faster); doesn't work if an input file gets rolled back fully, with mtime; doesn't get triggered when input files are only deleted

In practice, this tends to work well, as most tools won't rewind the mtime (this is a principle of Git, for instance), and I'd wager it's not that common for a developer to simply delete an input file without touching anything else. In any case, both can be solved simply by either cleaning the output files or running the tool without the --skip-if-unmodified flag, or whatever other name it might have.

I've implemented this in a quick wrapper script: https://gist.github.com/mernen/4b52651d6a979b0c95545204d7a60f5d/raw/76a4de7931ecc5d11d28ba5d5c31e0f862f27c0b/graphql-codegen-if-modified.js It's not good for general use, as it makes a lot of assumptions about the project structure, but it could be useful to others with proper tweaks.

I had a look at the source code, but it seemed like implementing this requires more familiarity with the project than I currently have. My first impression is that the best approach would be to introduce a step before "Generate outputs", which would run the globs to collect the input files, so that the generation step could be entirely skipped if the flag is used.

I didn't know that git wouldn't roll back modified timestamps, which was my main concern for using a cache, so your approach would be simpler and work just as well in normal use cases, albeit open up more chances for an edge case, so I'd personally prefer the slightly more robust approach with a cache.

Also, the approach I've taken doesn't checksum the contents of the files, just the serialized list of timestamps, otherwise it wouldn't be as fast.

@mernen About the script you posted: the input files should include the schema and document files from the config as well as package.json, not just the schema files.

Edit: using a cache also would enable making the build incremental in the future.

@slikts can you please try the latest version and see if it's improved? we did some changes recently and now the codegen should work faster.
It's not incremental builds - but it should improve.

I'd like to raise an issue distinct-from, but related-to build times: reloads.

In the absence of some incremental build process, changing a file that contains a gql tag will cause double hot-module replacements. That is, a jsx or tsx file gets built, which prompts a hot-reload and then the generated graphql code is rebuilt (unchanged!) which will prompt a second hot-reload. This can be irritating when you see the flicker of the first reload and try to interact with your application, only to be interrupted by the second reload.

If codegen is fast enough to avoid incremental builds, it may be possible to address this related issue with far less machinery.

Oh, my apologies! I missed that this just landed: https://github.com/dotansimha/graphql-code-generator/pull/2359 - Thanks!

Closing for now, @slikts feel free to open if you thing we can improve it more

Was this page helpful?
0 / 5 - 0 ratings