Gatsby: [gatsby-source-contentful] downloadLocal broken by gatsby-source-filesystem

Created on 15 Apr 2020  Β·  18Comments  Β·  Source: gatsbyjs/gatsby

Description

20843 introduced a timeout for createRemoteFileNode. I'm almost certain this breaks localFile for contentful projects with ~15 or greater assets.

I fixed transitive dependencies on gatsby-source-filesystem to 2.1.47 (right before #20843) and the issue was fixed.

Steps to reproduce

Attempt using gatsby-source-contentful with downloadLocal enabled. If gatsby develop takes > 30 seconds, createRemoteFileNode will silently timeout. Build will complete, but most localFile fields in graphiql will be null.

Expected result

localFile fields are populated.

Actual result

localFile fields are null

Other Notes

I think all of the createRemoteFileNode calls are actually completing, but the timeout has some nasty side effect.

I'd love to see this reverted as I have to resort to the very hacky npm-force-resolutions

Environment

System:
OS: Linux 4.4 Ubuntu 18.04.4 LTS (Bionic Beaver)
CPU: (8) x64 Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Shell: 4.4.20 - /bin/bash
Binaries:
Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node
Yarn: 1.22.1 - /usr/bin/yarn
npm: 6.13.4 - ~/.nvm/versions/node/v12.16.1/bin/npm
Languages:
Python: 2.7.17 - /usr/bin/python
npmPackages:
gatsby: ^2.17.4 => 2.20.20
gatsby-image: ^2.2.30 => 2.3.2
gatsby-plugin-brotli: ^1.3.1 => 1.3.1
gatsby-plugin-emotion: ^4.1.18 => 4.2.1
gatsby-plugin-manifest: ^2.2.41 => 2.3.3
gatsby-plugin-netlify: ^2.1.32 => 2.2.1
gatsby-plugin-postcss: ^2.1.16 => 2.2.1
gatsby-plugin-prefetch-google-fonts: 1.4.3 => 1.4.3
gatsby-plugin-react-helmet: ^3.1.13 => 3.2.2
gatsby-plugin-react-svg: ^3.0.0 => 3.0.0
gatsby-plugin-remove-fingerprints: 0.0.2 => 0.0.2
gatsby-plugin-resolve-src: ^2.0.0 => 2.0.0
gatsby-plugin-sharp: ^2.2.32 => 2.5.4
gatsby-source-contentful: ^2.1.73 => 2.2.7
gatsby-transformer-remote-filesystem: ^0.2.0 => 0.2.0
gatsby-transformer-sharp: ^2.3.0 => 2.4.4
npmGlobalPackages:
gatsby-cli: 2.11.8

not stale Contentful bug

Most helpful comment

Got the same result here as well: success Downloading remote files - 30.130s - 56/93 3.09/s
Missing random localFile data of some images. Download gets cut off right at the 30 seconds mark.

I did some further digging:

It seems the timeout is created in the requestRemoteNode.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L152-L157

All the requests promises are created at the same time but the actual requests are only loaded in order. This causes all the requests at the bottom of the stack to time out and fail.

The Timeout error is not handled in the download-contentful-assets and therefor fails silently.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-contentful/src/download-contentful-assets.js#L88-L90

When actually logging the error you get the following trace:

failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
...
...
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
success Downloading remote files - 30.618s - 39/93 3.04/s

Because the error is not handled the result is still seen as a success even though the website will not run properly due to missing data. So this error needs to be handled appropriately.

As for the request failing, the issue seems to be that too many of them are fired off at the same time. But are only loading in order.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L76

It seems gatsby-source-filesystem assumes you can download 200 files concurrently. But this might not work with the contentful API? I don't know what the limit is here. This is however adjustable via an environment variable.

Setting the following config seems to fix the timeout issue for me. :

_gatsby-config.js_

process.env.GATSBY_CONCURRENT_DOWNLOAD = 1

new output:

success Downloading remote files - 137.409s - 93/93 0.68/s

As for the timeout, 30 seconds is a good default, but might not be enough for larger files (or slow internet). Perhaps this needs to be adjustable if needed. Perhaps also through an environment variable?

@mjmaurer Maybe reopen this issue as more people seem to encounter this problem?

All 18 comments

Hi. I'm having trouble reproducing this. I created a test site with 100 (5-10MB) image assets. I added downloadLocal: true and ran gatsby develop. Downloading remote files took 48.9s. After loading, localFile was set on all of the assets. How long does the "Downloading remote files" stage take? Can you post the full build logs? Ideally, are you able to share a repro?

I'm at a loss for words. Removed the force-resolution, and did a clean install of gatsby-source-contentful. A gatsby clean to finish off, and suddenly everything is working as expected.

Thanks for the work of reproing and that PR. Definitely ran multiple clean npm installs yesterday as well as multiple clean gatsby builds. In the end, not sure what was happening.

Hi,

Is there a specific solution to this problem? I have ~274 assets on my space... and a bunch of them are failing.

What I'm noticing is that, whenever I reach ~220-230 files, it times out.

Like this:

Fetch Contentful data: 986.556ms
β ΄ source and transform nodes
[======================= ] 13.563 s 231/274 84% Downloading remote files

Then it _completes_ and later it fails.

Running into this as well - have 217 assets in my Contentful (including some larger video files - all below 50MB), it claims to have completed the download and all of that. However, when I query, I'm getting this:

image

File exists on Contentful, file does not exist on my local server. There's no notice or warning that stuff is failing, it's just in the background not completing.

Getting this in the terminal: success Downloading remote files - 30.672s - 174/217 7.07/s

Quick update: removed the plugin, installed it again, cleaned, etc, and got this: success Downloading remote files - 30.591s - 164/217 7.09/s - so looks like there's definitely something with the timeout where it gets right over 30.5s and decides to fail.

Yea I ran into this again as well. I think it was my internet connection + a large file. But it does seem like an issue that if one file fails, everything fails silently

@mjmaurer I manually bumped up the two TIMEOUT numbers in create-remote-file-node.js (in gatsby-source-filesystem) from 30s to 30 minutes, and it's successfully pulling all the files (including large videos). It's 100% a gross hack and not sustainable, but at least to get over the hump here it works.

Got the same result here as well: success Downloading remote files - 30.130s - 56/93 3.09/s
Missing random localFile data of some images. Download gets cut off right at the 30 seconds mark.

I did some further digging:

It seems the timeout is created in the requestRemoteNode.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L152-L157

All the requests promises are created at the same time but the actual requests are only loaded in order. This causes all the requests at the bottom of the stack to time out and fail.

The Timeout error is not handled in the download-contentful-assets and therefor fails silently.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-contentful/src/download-contentful-assets.js#L88-L90

When actually logging the error you get the following trace:

failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
...
...
failed to process http://images.ctfassets.net/xxx.jpeg
TimeoutError: Timeout awaiting 'request' for 30000ms
success Downloading remote files - 30.618s - 39/93 3.04/s

Because the error is not handled the result is still seen as a success even though the website will not run properly due to missing data. So this error needs to be handled appropriately.

As for the request failing, the issue seems to be that too many of them are fired off at the same time. But are only loading in order.

https://github.com/gatsbyjs/gatsby/blob/ae306827f3b0a96234e1b0d141748ad1cf6b932d/packages/gatsby-source-filesystem/src/create-remote-file-node.js#L76

It seems gatsby-source-filesystem assumes you can download 200 files concurrently. But this might not work with the contentful API? I don't know what the limit is here. This is however adjustable via an environment variable.

Setting the following config seems to fix the timeout issue for me. :

_gatsby-config.js_

process.env.GATSBY_CONCURRENT_DOWNLOAD = 1

new output:

success Downloading remote files - 137.409s - 93/93 0.68/s

As for the timeout, 30 seconds is a good default, but might not be enough for larger files (or slow internet). Perhaps this needs to be adjustable if needed. Perhaps also through an environment variable?

@mjmaurer Maybe reopen this issue as more people seem to encounter this problem?

I've looked into this a bit and from what I can see there's a few issues at play here:

  1. The gatsby-source-contentful plugin swallows exceptions from createRemoteFileNode. These will most likely be networking errors if the downloads timeout or something else unexpected happens like a TCP connection being reset. gatsby-source-contentful then assumes the file has been downloaded successfully when it hasn't and errors crop up later in the build when null references are hit.

  2. The 30s timeout got is configured with. It's possible this will be hit if you're downloading a large asset and you don't have the bandwidth to complete the download in 30s. The easiest way to reproduce this is throttling your network connection and running a build. On MacOS, I used the Network Link Conditioner. _Note: The asset size limit in Contentful is 1GB_

  3. The default number of concurrent downloads in create-remote-file-node. The default is 200 and this seems to cause all sorts of problems for me running a local build in a large Contentful space. Since there's more downloads happening concurrently, a timeout is more likely for any individual file plus I'm also seeing the occasional connection reset before a timeout happens. It's likely this is less of an issue if you've got a high bandwidth connection to Contentful's asset CDN (aka CloudFront) but I wonder if this is a sensible default from a reliability standpoint. Maybe this could be determined more intelligently, e.g. if network errors are encountered perform some kind of exponential backoff.

I'm going to start working on a PR to fix point 1 immediately. I don't think the Contentful source plugin should ever swallow errors. I would love to get someone's thoughts on points 2 & 3. Happy to work on these as well.

related:

  • #22010 (The retry in got is broken)
  • #15783 (Had a lot of issues when using createRemoteFileNode for source videos)

Hiya!

This issue has gone quiet. Spooky quiet. πŸ‘»

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! πŸ’ͺπŸ’œ

I was about to use the downloadLocal today and run into a different issue, reproduced on a clean install.

Seemingly the plugin options are not passed down correctly. In the source plugin pluginConfig.get(downloadLocal) always returns false (same applies for forceFullSync) so the download wont ever start

I'll dig a bit deeper if I'll have the time

System:
OS: macOS 10.15.5
CPU: (16) x64 Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node
npm: 6.14.4 - ~/.nvm/versions/node/v12.16.1/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 83.0.4103.116
Safari: 13.1.1
npmPackages:
gatsby: ^2.23.12 => 2.23.12
gatsby-image: ^2.4.9 => 2.4.9
gatsby-plugin-manifest: ^2.4.14 => 2.4.14
gatsby-plugin-offline: ^3.2.13 => 3.2.13
gatsby-plugin-react-helmet: ^3.3.6 => 3.3.6
gatsby-plugin-sharp: ^2.6.14 => 2.6.14
gatsby-source-contentful: ^2.3.24 => 2.3.24
gatsby-source-filesystem: ^2.3.14 => 2.3.14
gatsby-transformer-sharp: ^2.5.7 => 2.5.7
npmGlobalPackages:
gatsby-cli: 2.12.52

@jayhostan could you please check if this is still the case with gatsby-source-contentful@next? Thanks :)

@jayhostan could you please check if this is still the case with gatsby-source-contentful@next? Thanks :)

hey @axe312ger , yes the issue is present on next as well

System:
OS: macOS 10.15.5
CPU: (16) x64 Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 12.16.1 - ~/.nvm/versions/node/v12.16.1/bin/node
npm: 6.14.4 - ~/.nvm/versions/node/v12.16.1/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 83.0.4103.116
Safari: 13.1.1
npmPackages:
gatsby: ^2.23.12 => 2.24.3
gatsby-image: ^2.4.9 => 2.4.13
gatsby-plugin-cdn-files: 0.0.3 => 0.0.3
gatsby-plugin-manifest: ^2.4.14 => 2.4.18
gatsby-plugin-offline: ^3.2.13 => 3.2.18
gatsby-plugin-react-helmet: ^3.3.6 => 3.3.10
gatsby-plugin-remote-images: ^2.2.0 => 2.2.0
gatsby-plugin-sharp: ^2.6.14 => 2.6.19
gatsby-source-contentful: ^3.0.0-contentful-next.35 => 3.0.0-contentful-next.35
gatsby-source-filesystem: 2.2.0 => 2.2.0
gatsby-source-remote-file: ^0.2.0 => 0.2.0
gatsby-transformer-remote-filesystem: ^1.0.0 => 1.0.0
gatsby-transformer-sharp: ^2.5.7 => 2.5.11
npmGlobalPackages:
gatsby-cli: 2.12.52

@jayhostan it works fine for me on my local machine (adding downloadLocal: true && it downloads file). Are you sure you pass the options correctly in gatsby-config.js? πŸ™ˆ

@jayhostan it works fine for me on my local machine (adding downloadLocal: true && it downloads file). Are you sure you pass the options correctly in gatsby-config.js? πŸ™ˆ
@axe312ger
Screenshot 2020-07-17 at 14 19 31
gatsby-node in the source plugin:
Screenshot 2020-07-17 at 14 20 10
Screenshot 2020-07-17 at 14 20 41

oh well my bad sorry.. I've passed it down at the wrong place. I need a rubber duck

I'm still getting this issue. When I have bad service, my assets don't download at all. Could we make the timeout an option in the plugin's config? This makes it impossible to build in dev mode.

I am very open for a PR, but it should contain:

  • an config option to set the download timeout
  • a retry logic when it fails
Was this page helpful?
0 / 5 - 0 ratings

Related issues

theduke picture theduke  Β·  3Comments

ghost picture ghost  Β·  3Comments

magicly picture magicly  Β·  3Comments

jimfilippou picture jimfilippou  Β·  3Comments

KyleAMathews picture KyleAMathews  Β·  3Comments