Gatsby: Really slow build times when adding locales in Contentful

Created on 18 Dec 2019  Â·  13Comments  Â·  Source: gatsbyjs/gatsby

Description

We have a Gatsby + Contentful project with 7 locales. The build process for this project is slowing to a crawl as we add content in the CMS.
I created a local plugin that printed the "number of nodes by type" generated during Gatsby build. I was able to determine each added locale acts as a multiplier for the amount of nodes generated in Gatsby. So, for example, if there are 1000 assets in Contentful you get 7000 Gatsby nodes (for 7 locales) _despite several assets not being available in all locales_.

Steps to reproduce

There isn't a clear process to reproduce this without having access to a Contentful space that can have multiple locales. If you do, then the following gatsby-node.js hook can help debug the steep increase in nodes as you enable more locales:

module.exports = {
  onPostBootstrap: args => {
    const nodes = args.getNodes();

    const counts = nodes.reduce((acc, node) => {
      const type = node.internal.type;
      const count = (acc[type] || 0) + 1;
      return { ...acc, [type]: count };
    }, {});

    console.table({ ...counts, Total: nodes.length });
  }
}

Expected result

I'm not sure if there is a broad stroke approach to avoid duplicating unused nodes for a given locale across all Contentful content types but at least for the case of assets it is entirely possible (and I guess, valid) to avoid generating ContentfulAsset nodes where an asset is unavailable in a given locale e.g. checking the file field is not null for a ContentfulAsset node.

Actual result

With 6 locales:

  success source and transform nodes - 41.782s
+ success building schema - 280.739s
  success createPages - 11.840s
  success createPagesStatefully - 0.081s
  success onPreExtractQueries - 0.003s
  success update schema - 0.292s
  success extract queries from components - 2.789s
  success write out requires - 0.134s
  success write out redirect data - 0.003s
  success Build manifest and related icons - 0.087s
  success onPostBootstrap - 3.946s
+ info bootstrap finished - 363.509 s

+ Total nodes: 55669

Breakdown of node counts by type: with-6-locales.txt

With 7 locales:

  success source and transform nodes - 50.208s
- success building schema - 632.450s
  success createPages - 20.045s
  success createPagesStatefully - 0.093s
  success onPreExtractQueries - 0.005s
  success update schema - 0.368s
  success extract queries from components - 3.301s
  success write out requires - 0.191s
  success write out redirect data - 0.003s
  success Build manifest and related icons - 0.077s
  success onPostBootstrap - 5.728s
- info bootstrap finished - 735.300 s

- Total nodes: 64671

Breakdown of node counts by type: with-7-locales.txt

Environment

Contentful stats:

  • 3019 entries
  • 1769 assets
  • 41 content types
  • 6 locales (we want to add a 7th)

Gatsby environment:

  System:
    OS: macOS Mojave 10.14.6
    CPU: (12) x64 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
    Shell: 5.7.1 - /usr/local/bin/zsh
  Binaries:
    Node: 12.13.0 - /var/folders/11/zcvk9j4j7rd_dj0t5hcv9wc80000gp/T/yarn--1576691369326-0.6597870373283115/node
    Yarn: 1.21.1 - /var/folders/11/zcvk9j4j7rd_dj0t5hcv9wc80000gp/T/yarn--1576691369326-0.6597870373283115/yarn
    npm: 6.13.2 - /usr/local/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  Browsers:
    Chrome: 79.0.3945.88
    Firefox: 71.0
    Safari: 13.0.4
  npmPackages:
    gatsby: 2.18.13 => 2.18.13 
    gatsby-cli: ^2.8.19 => 2.8.19 
    gatsby-image: ^2.2.37 => 2.2.37 
    gatsby-plugin-catch-links: ^2.1.21 => 2.1.21 
    gatsby-plugin-feed: ^2.3.25 => 2.3.25 
    gatsby-plugin-manifest: ^2.2.33 => 2.2.33 
    gatsby-plugin-react-helmet: ^3.1.18 => 3.1.18 
    gatsby-plugin-sharp: ^2.3.9 => 2.3.9 
    gatsby-plugin-sitemap: ^2.2.24 => 2.2.24 
    gatsby-plugin-svgr: ^2.0.2 => 2.0.2 
    gatsby-plugin-typescript: ^2.1.23 => 2.1.23 
    gatsby-plugin-webpack-bundle-analyzer: ^1.0.5 => 1.0.5 
    gatsby-remark-prismjs: ^3.3.27 => 3.3.27 
    gatsby-source-contentful: ^2.1.71 => 2.1.71 
    gatsby-source-filesystem: ^2.1.42 => 2.1.42 
    gatsby-transformer-remark: ^2.6.43 => 2.6.43 
    gatsby-transformer-sharp: ^2.3.9 => 2.3.9 
  npmGlobalPackages:
    gatsby-cli: 2.7.15
question or discussion

Most helpful comment

@pvdz thanks for the feedback :) Just a couple of comments:

If I'm reading it right, that's difficult. One node isn't another and without an explicit schema we don't know whether one node is the same as others without going through the whole thing explicitly. And that's super expensive, unfortunately.

What I was suggesting is that gatsby-source-contentful introduce an explicit type ContentfulRichText @dontInfer type using createSchemaCustomization and assign that to all rich text fields it encounters in the content. My guess is that schema inference code would no longer need to walk down those fields in contentful nodes and output such a large amount of redundant types. Alternatively, the source plugin should use the same type for all rich text nodes it encounters. This corresponds to the following change in normalize.js:

function prepareRichTextNode(node, key, content, createNodeId) {
  const str = stringify(content);
  const richTextNode = Object.assign({}, content, {
    id: createNodeId(`${node.id}${key}RichTextNode`),
    parent: node.id,
    children: [],
    [key]: str,
    internal: {
-     type: _.camelCase(`${node.internal.type} ${key} RichTextNode`),
+     type: makeTypeName(`RichTextField`),
      mediaType: `text/richtext`,
      content: str,
      contentDigest: digest(str)
    }
  });
  node.children = node.children.concat([richTextNode.id]);
  return richTextNode;
}

Creating a new type using (_.camelCase(`${node.internal.type} ${key} RichTextNode`)) seems unnecessary. All rich text nodes share the same fields and field types regardless of parent node.

I'd be happy to submit a PR for either of those two approaches.

When you say 7000 nodes, does that mean 7000 pages? And is it 1000x7 or is that 7000x7?

We only build out 2163 pages in both the 6-locale and 7-locale tests. You can check the breakdown of nodes by type in the files I attached in the issue description. In other words, just adding a new locale in Contentful does not result in more pages in our build but it slows down the earlier parts of gatsby tremendously: the source nodes and build schema stages.

A quick example of what's happening… Say we have:

  • 1000 article entries and articles don't have any localized fields
  • 6 locales enabled in Contentful

In Gatsby, we see 6000 ContentfulArticle created and for all but the default locale, the nodes are empty i.e. the fields are all null

Out of curiosity, could you paste the entire output from console (before / after) for a cold build? gatsby clean; gatsby build

Here's the full output of running the same project with the existing 6 locales and then enabling a 7th locale in Contentful and rerunning (both runs with cleaned cache).

full-build-6-locales.txt
full-build-7-locales.txt

All 13 comments

@pvdz / @pieh / @Khaledgarbaya if you end up looking at this issue and want access to our Contentful space, I can look at arranging this since you all represent Gatsby and Contentful employees.

Just from checking logs:
This doesn't seem like Contentful or gatsby-source-contentful issue . For that we can check the source and transform nodes step, which did take longer, but in reasonably, almost linear fashion. So most likely @Khaledgarbaya is off the hook here ;)

The big time bump is in building schema part, which is pure Gatsby core, part when we "infer" schema from available data.

You can workaround this costly inferring process if you would use https://www.gatsbyjs.org/packages/gatsby-plugin-schema-snapshot/ - this will make schema snapshot and reuse that to for schema generating on next runs to avoid inferring cost

Probably good config would be something like this:

// gatsby-config.js
module.exports = {
  plugins: [
    {
      resolve: `gatsby-plugin-schema-snapshot`,
      options: {
        path: `schema.gql`,
        include: {
          plugins: [`gatsby-source-contentful`],
        },
        update: process.env.GATSBY_UPDATE_SCHEMA_SNAPSHOT,
      },
    },
  ],
}

Just note - this will "freeze" the schema - in case you add/remove/modify some fields in Contentful you would have to delete generated schema.gql file to re-generate the snapshot.

Let me know if that works

We still would like to check why the schema generation increases from 280s to 632s (125% increase) when only adding ~15% more nodes, so if you are able to grant us access that would be great (just not sure if we would be able to dig into it soon, with holiday season almost upon us)

Another idea: what kind of things do you pass as a context when you programatically create pages in gatsby-node?

We found that passing large objects there has detrimental effects to schema building step performance

@pieh thanks for the tips I’ll give the plugin a go. I tried disabling inference on page context last week and didn’t move the numbers much - I’ll try it again to double check. What’s the best way to share a delivery token with you? (Can be done after holiday period)

@pieh / @pvdz I just added the plugin you mentioned and got a significant speed boost! For 6 locales, it's now 11.408s down from ~280s. I decided to survey the schema snapshot to understand why there are so many types and I found this interesting anomaly:

$ cat schema-snapshot.gql | egrep '^type ' | wc -l
   29507
$ cat schema-snapshot.gql | egrep -i '^type.*RichTextNode' | wc -l
   29328

So much work is done on creating _redundant_ types for Contentful RichText fields! These all follow this format:

 type ContentfulSlicesModule implements Node @derivedTypes @dontInfer {
   internalTitle: String
   title: String
   ctaText: String
   ctaLink: String
   secondaryCtaText: String
   secondaryCtaLink: String
   layout: String
   theme: String
   imageDesktop: ContentfulAsset @link(by: "id", from: "imageDesktop___NODE")
   imageMobile: ContentfulAsset @link(by: "id", from: "imageMobile___NODE")
   video: ContentfulExternalVideo @link(by: "id", from: "video___NODE")
+  body: contentfulSlicesModuleBodyRichTextNode @link(by: "id", from: "body___NODE")
   spaceId: String
   contentful_id: String
   createdAt: Date @dateformat
   updatedAt: Date @dateformat
   sys: ContentfulSlicesModuleSys
   node_locale: String
 }

+ type contentfulSlicesModuleBodyRichTextNode implements Node @derivedTypes @dontInfer {
+   content: [contentfulSlicesModuleBodyRichTextNodeContent]
+   nodeType: String
+   body: String
+   fields: contentfulSlicesModuleBodyRichTextNodeFields
+ }
+ 
+ type contentfulSlicesModuleBodyRichTextNodeContent @derivedTypes {
+   content: [contentfulSlicesModuleBodyRichTextNodeContentContent]
+   nodeType: String
+ }
+ 
+ type contentfulSlicesModuleBodyRichTextNodeContentContent {
+   value: String
+   nodeType: String
+ }
+ 
+ type contentfulSlicesModuleBodyRichTextNodeFields @derivedTypes {
+   readingTime: contentfulSlicesModuleBodyRichTextNodeFieldsReadingTime
+ }
+ 
+ type contentfulSlicesModuleBodyRichTextNodeFieldsReadingTime {
+   text: String
+   minutes: Float
+   time: Int
+   words: Int
+ }

There's got to be a way to infer one rich text type to use on all rich text fields in Contentful content types, right?

If I'm reading it right, that's difficult. One node isn't another and without an explicit schema we don't know whether one node is the same as others without going through the whole thing explicitly. And that's super expensive, unfortunately.

The best thing we can do is try to automate the triaging we've done in this issue so that users can come up with this answer themselves, when it starts to matter.

11s down from 5 minutes is huge and just shows how important this is.

When you say 7000 nodes, does that mean 7000 pages? And is it 1000x7 or is that 7000x7?

Out of curiosity, could you paste the entire output from console (before / after) for a cold build? gatsby clean; gatsby build

The best thing we can do is try to automate the triaging we've done in this issue so that users can come up with this answer themselves, when it starts to matter.

Yup, we should implement automated suggestion to do this when inference becomes bottleneck. There are some issues (not breaking, but annoying) when using schema snapshot plugin tho, when I tried this with our using-contentful example, I'd get tons of warnings like:

warn The type `contentfulSchemaTestTextLongTextNode` does not explicitly define the field `childMarkdownRemark`.
On types with the `@dontInfer` directive, or with the `infer` extension set to `false`, automatically adding fields
for children types is deprecated.
In Gatsby v3, only children fields explicitly set with the `childOf` extension will be added.

And we need to address it - we really shouldn't produce snapshot that then relies on deprecated behaviour.

I can imagine @disintegrator will see a lot of those with his snapshot that has 29 thousand lines. With using-contentful I get 18 warnings like that, and snapshot is only 1560 lines

@pvdz thanks for the feedback :) Just a couple of comments:

If I'm reading it right, that's difficult. One node isn't another and without an explicit schema we don't know whether one node is the same as others without going through the whole thing explicitly. And that's super expensive, unfortunately.

What I was suggesting is that gatsby-source-contentful introduce an explicit type ContentfulRichText @dontInfer type using createSchemaCustomization and assign that to all rich text fields it encounters in the content. My guess is that schema inference code would no longer need to walk down those fields in contentful nodes and output such a large amount of redundant types. Alternatively, the source plugin should use the same type for all rich text nodes it encounters. This corresponds to the following change in normalize.js:

function prepareRichTextNode(node, key, content, createNodeId) {
  const str = stringify(content);
  const richTextNode = Object.assign({}, content, {
    id: createNodeId(`${node.id}${key}RichTextNode`),
    parent: node.id,
    children: [],
    [key]: str,
    internal: {
-     type: _.camelCase(`${node.internal.type} ${key} RichTextNode`),
+     type: makeTypeName(`RichTextField`),
      mediaType: `text/richtext`,
      content: str,
      contentDigest: digest(str)
    }
  });
  node.children = node.children.concat([richTextNode.id]);
  return richTextNode;
}

Creating a new type using (_.camelCase(`${node.internal.type} ${key} RichTextNode`)) seems unnecessary. All rich text nodes share the same fields and field types regardless of parent node.

I'd be happy to submit a PR for either of those two approaches.

When you say 7000 nodes, does that mean 7000 pages? And is it 1000x7 or is that 7000x7?

We only build out 2163 pages in both the 6-locale and 7-locale tests. You can check the breakdown of nodes by type in the files I attached in the issue description. In other words, just adding a new locale in Contentful does not result in more pages in our build but it slows down the earlier parts of gatsby tremendously: the source nodes and build schema stages.

A quick example of what's happening… Say we have:

  • 1000 article entries and articles don't have any localized fields
  • 6 locales enabled in Contentful

In Gatsby, we see 6000 ContentfulArticle created and for all but the default locale, the nodes are empty i.e. the fields are all null

Out of curiosity, could you paste the entire output from console (before / after) for a cold build? gatsby clean; gatsby build

Here's the full output of running the same project with the existing 6 locales and then enabling a 7th locale in Contentful and rerunning (both runs with cleaned cache).

full-build-6-locales.txt
full-build-7-locales.txt

@pieh related issue: #19674 (tldr; we don't print childOf directive which is probably a bug)

@pvdz @pieh hope y'all have had a nice break :)
I wanted to find out if there is any action I need to take on this issue. Should I be submitting a PR for my suggestion in https://github.com/gatsbyjs/gatsby/issues/20197#issuecomment-567572462? Is there currently a PR related to this issue that I can track?
If using gatsby-plugin-schema-snapshot is the accepted solution (as in, it is not a temporary workaround) then I can close this issue.

I'm currently trying to get to the bottom of # 20338 (deliberately not linking because it's not related to this issue). Might take a few days to wrap that up.

Meanwhile, I welcome any PR that improves build times without affecting anything else. So if you can improve the situation on the RichTextField without regressing the general perf then by all means please do :D

(additionally I predict that gatsby-plugin-schema-snapshot is going to play an important role for sites at scale so any improvement to that plugin is super welcome)

@pvdz thanks for the guidance. I'm closing this issue in favour of opening another one more focused on rich text type generation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hobochild picture hobochild  Â·  3Comments

KyleAMathews picture KyleAMathews  Â·  3Comments

rossPatton picture rossPatton  Â·  3Comments

kalinchernev picture kalinchernev  Â·  3Comments

dustinhorton picture dustinhorton  Â·  3Comments