Occasionally, during our Gatsby build, right after the line Starting to fetch data from Drupal, we will see about 20-25 images which say Failed to process remote content.
In the snippet below, I've change the exact filenames, but kept spaces, dashes, filetypes so you can get a sense of the kind of files that occasionally fail.
Note that some of the files that fail to process are large (4 mb) but others are tiny (80 kb). We've noticed pngs, gifs, and jpgs so far.
Is there a maximum file-size limit that Gatsby will be able to process for remote content? Or is there a file-naming issue with any of the files below (e.g. do we need to avoid dashes, underscores, or spaces)?
Any assistance would be appreciated!
Starting to fetch data from Drupal
Failed to process remote content http://site-url/sites/default/files/inline-images/Name_Of_File_S.jpg
Failed to process remote content http://site-url/sites/default/files/2019-03/name-file_0.png
Failed to process remote content http://site-url/sites/default/files/2019-03/Test%20File%20With-Dash_cropped.jpg
Failed to process remote content http://site-url/sites/default/files/nameoffile.gif
Failed to process remote content http://site-url/sites/default/files/Name%20of%20file_2018_1.jpg
Failed to process remote content http://site-url/sites/default/files/NameOfFile.gif
Failed to process remote content http://site-url/sites/default/files/TEST_file2_colour.jpg
Failed to process remote content http://site-url/sites/default/files/Nameoffile.gif
Failed to process remote content http://site-url/sites/default/files/Test_test-file_test-1440x960.jpg
Failed to process remote content http://site-url/sites/default/files/Test%name%20file_2018%20resized%2050.jpg
Failed to process remote content http://site-url/sites/default/files/Test.jpg
Failed to process remote content http://site-url/sites/default/files/Test.png
Failed to process remote content http://site-url/sites/default/files/Test_File_with_Underscores.jpg
Failed to process remote content http://site-url/sites/default/files/testfile_0.png
Failed to process remote content http://site-url/sites/default/files/Test%20File_RETOUCH_S.jpg
Failed to process remote content http://site-url/sites/default/files/testing.gif
Failed to process remote content http://site-url/sites/default/files/only-dashes-and-numbers-292636697.jpg
Failed to process remote content http://site-url/sites/default/files/TestFile.png
Failed to process remote content http://site-url/sites/default/files/Testchartupdated.jpg
Failed to process remote content http://site-url/sites/default/files/TEST%20File_RETOUCH_S.jpg
Failed to process remote content http://site-url/sites/default/files/test_file.png
Failed to process remote content http://site-url/sites/default/files/Test%20of%20Spaces%20File%20Test-176_.jpg
Failed to process remote content http://site-url/sites/default/files/testfile_0.png
success source and transform nodes β 11.138 s
npmPackages:
gatsby: ^2.0.76 => 2.0.117
gatsby-image: ^2.0.20 => 2.0.29
gatsby-plugin-favicon: ^3.1.5 => 3.1.5
gatsby-plugin-google-analytics: ^2.0.9 => 2.0.13
gatsby-plugin-manifest: ^2.0.9 => 2.0.17
gatsby-plugin-offline: ^2.0.24 => 2.0.24
gatsby-plugin-react-helmet: ^3.0.2 => 3.0.6
gatsby-plugin-sharp: ^2.0.14 => 2.0.20
gatsby-source-drupal: ^3.0.15 => 3.0.23
gatsby-source-filesystem: ^2.0.23 => 2.0.23
gatsby-transformer-sharp: ^2.1.8 => 2.1.13
npmGlobalPackages:
gatsby-cli: 2.4.8
Duplicate of #12280 closing in favour of that issue.
Thanks for quick responses @coreyward and @wardpeet!
I agree #12280 is related, however I don't think this is a duplicate issue. (i.e. From what I gather, issue #12280 appears to be questioning the logic behind showing the console warning instead of just rejecting it.)
Is it possible to re-open this issue? It's still not clear to me why images that exist on our source Drupal site are randomly resulting in this error. Could it be the filenames? Filesize?
We are seeing the same issue happening with gatsby-source-wordpress. About 50% of the time our builds fail with seemingly no rhyme or reason and no code changes. Any advice would be greatly appreciated.
Downloading remote files [=====================---------] 3066/4314 97.0 secs 71%Failed to process remote content https://blog.site.com/wp-content/uploads/2016/03/image.jpg
β source and transform nodesFailed to process remote content https://blog.site.com/wp-content/uploads/2016/03/image.jpg
Downloading remote files [======================--------] 3137/4314 99.6 secs 73%Failed to process remote content https://blog.site.com/wp-content/uploads/2016/03/image.jpg
β source and transform nodesFailed to process remote content https://blog.site.com/wp-content/uploads/2016/03/image.jpg
Downloading remote files [======================--------] 3153/4314 99.9 secs 73%Failed to process remote content https://blog.site.com/wp-content/uploads/2016/03/image.jpg
Could we get access to one of these repo's? Feel free to dm me or reach out at [email protected]
Hi @wardpeet , thanks for the response! Unfortunately I can't provide access to the repo (it's not a personal one).
In any case, it's not a show-stopper anymore since the project is complete. However, if you do solve the issue, I'd love to know the cause for the next time I use Gatsby in a project.
If we cannot get access to the repo how are we going to fix this issue?
The Answer!
The issue is in the "Download all files" logic. The code is written in parallel which causes buffer and garbage collection issues with unresolved promises in the call-stack when you have substantial pages and media entities.
Using lodash map or each does not wait for a promise to resolve before continuing on the next loop.
Simply refactor the code to be serialised using the appropriate for-loop functionality and voila!
// Download all files
for (const node of nodes) {
let fileNode
let fileUrl
let url
let auth
// Only attempt to download file if the node is of the correct type
if (node.internal.type === `files` || node.internal.type === `file__file`) {
fileUrl = node.url
// Support JSON API 2.x file URI format https://www.drupal.org/node/2982209
if (typeof node.uri === `object`) {
fileUrl = node.uri.url
}
// Resolve w/ baseUrl if node.uri isn't absolute.
url = new URL(fileUrl, baseUrl)
// If we have basicAuth credentials, add them to the request.
auth =
typeof basicAuth === `object`
? {
htaccess_user: basicAuth.username,
htaccess_pass: basicAuth.password,
}
: {}
// Attempt to get the remote file
try {
fileNode = await createRemoteFileNode({
url: url.href,
store,
cache,
createNode,
createNodeId,
parentNodeId: node.id,
auth,
})
// Set the localFile___NODE
node.localFile___NODE = fileNode.id
} catch (e) {
// Log but do not act upon
console.error(e)
}
}
}
I have tested this code in my own project's gatsby-node.js and it works every time.
Background
Just for a bit of background about my usage of this plugin. I am creating my companies website using Gatsby and Drupal. This website has projects, news, blogs, etc, which is quite substantial in terms of pages and media entities.
The current code works wonders for small projects but as soon as you hit a certain limit, that's when you get these issues caused by masses of unresolved promises in the call-stack.
Other thoughts
It might be worth refactoring the code (like above) for requesting the data from the JSON-API endpoints. This is so the plugin does not DDOS the Drupal backend with vasts amounts of Axios requests when the CMS has vasts amount of content. This could also fix future buffer issues when collating the Drupal data as websites grow with content.
FYI: I am happy to refactor the code if I am given access.
Hey @bakeruk , I'm having this exact same issue, only on gatsby-source-wordpress . Any idea how I might utilize your solution on a wordpress-based system? It's funny, my site was working perfectly until I added about 8 new posts all with around 5-10 images per post, now the whole build fails because it "can't process remote content". This is all with netfliy, and wordpress being hosted on another server. But it also fails if I run gatsby develop.
12:09:00 AM: Build ready to start
12:09:02 AM: build-image version: d5d16c91ca3e1e5a990086daa8a1d5bd8564d12a
12:09:02 AM: build-image tag: v3.2.2
12:09:02 AM: buildbot version: 93c10be3dc42bccef2b5600a7e10ec1d4a1c7051
12:09:02 AM: Fetching cached dependencies
12:09:03 AM: Starting to download cache of 255.1KB
12:09:03 AM: Finished downloading cache in 156.191156ms
12:09:03 AM: Starting to extract cache
12:09:03 AM: Failed to fetch cache, continuing with build
12:09:03 AM: Starting to prepare the repo for build
12:09:03 AM: No cached dependencies found. Cloning fresh repo
12:09:03 AM: git clone https://github.com/jacobsilver2/mary-gatsby
12:09:04 AM: Preparing Git Reference refs/heads/master
12:09:05 AM: Starting build script
12:09:05 AM: Installing dependencies
12:09:06 AM: v10.15.3 is already installed.
12:09:07 AM: Now using node v10.15.3 (npm v6.4.1)
12:09:07 AM: Attempting ruby version 2.6.2, read from environment
12:09:09 AM: Using ruby version 2.6.2
12:09:09 AM: Using PHP version 5.6
12:09:09 AM: Started restoring cached node modules
12:09:09 AM: Finished restoring cached node modules
12:09:09 AM: Installing NPM modules using NPM version 6.4.1
12:09:46 AM: > [email protected] install /opt/build/repo/node_modules/sharp
12:09:46 AM: > (node install/libvips && node install/dll-copy && prebuild-install) || (node-gyp rebuild && node install/dll-copy)
12:09:46 AM: info
12:09:46 AM: sharp
12:09:46 AM: Downloading https://github.com/lovell/sharp-libvips/releases/download/v8.7.4/libvips-8.7.4-linux-x64.tar.gz
12:09:50 AM: > [email protected] postinstall /opt/build/repo/node_modules/gatsby-telemetry
12:09:50 AM: > node src/postinstall.js
12:09:51 AM: > [email protected] postinstall /opt/build/repo/node_modules/cwebp-bin
12:09:51 AM: > node lib/install.js
12:09:51 AM: β cwebp pre-build test passed successfully
12:09:51 AM: > [email protected] postinstall /opt/build/repo/node_modules/mozjpeg
12:09:51 AM: > node lib/install.js
12:09:52 AM: β mozjpeg pre-build test passed successfully
12:09:52 AM: > [email protected] postinstall /opt/build/repo/node_modules/pngquant-bin
12:09:52 AM: > node lib/install.js
12:09:52 AM: β pngquant pre-build test passed successfully
12:09:53 AM: > [email protected] postinstall /opt/build/repo/node_modules/styled-components
12:09:53 AM: > node ./scripts/postinstall.js || exit 0
12:09:53 AM: Use styled-components at work? Consider supporting our development efforts at https://opencollective.com/styled-components
12:09:55 AM: npm
12:09:55 AM: WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
12:09:55 AM: npm
12:09:55 AM: WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})
12:09:55 AM: added 1828 packages from 1151 contributors and audited 28509 packages in 44.812s
12:09:55 AM: found 2 vulnerabilities (1 low, 1 moderate)
12:09:55 AM: run npm audit fix to fix them, or npm audit for details
12:09:55 AM: NPM modules installed
12:09:56 AM: Started restoring cached go cache
12:09:56 AM: Finished restoring cached go cache
12:09:56 AM: unset GOOS;
12:09:56 AM: unset GOARCH;
12:09:56 AM: export GOROOT='/opt/buildhome/.gimme/versions/go1.12.linux.amd64';
12:09:56 AM: export PATH="/opt/buildhome/.gimme/versions/go1.12.linux.amd64/bin:${PATH}";
12:09:56 AM: go version >&2;
12:09:56 AM: export GIMME_ENV='/opt/buildhome/.gimme/env/go1.12.linux.amd64.env';
12:09:56 AM: go version go1.12 linux/amd64
12:09:56 AM: Installing missing commands
12:09:56 AM: Verify run directory
12:09:56 AM: Executing user command: gatsby build
12:09:59 AM: success open and validate gatsby-configs β 0.007 s
12:09:59 AM: success load plugins β 0.616 s
12:09:59 AM: success onPreInit β 0.011 s
12:09:59 AM: success delete html and css files from previous builds β 0.009 s
12:09:59 AM: success initialize cache β 0.011 s
12:09:59 AM: success copy gatsby files β 0.031 s
12:09:59 AM: success onPreBootstrap β 0.011 s
12:10:00 AM: -> wordpress__acf_options fetched : 1
12:10:01 AM: -> wordpress__wp/v2_acf fetched : 1
12:10:02 AM: -> wordpress__wp/v2_options fetched : 1
12:10:02 AM: -> wordpress__POST fetched : 20
12:10:03 AM: -> wordpress__PAGE fetched : 0
12:10:05 AM: -> wordpress__wp_media fetched : 174
12:10:06 AM: -> wordpress__wp_slide fetched : 3
12:10:06 AM: -> wordpress__wp_taxonomies fetched : 1
12:10:07 AM: -> wordpress__CATEGORY fetched : 3
12:10:07 AM: -> wordpress__TAG fetched : 0
12:10:08 AM: -> wordpress__wp_users fetched : 1
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/18.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/17.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/16.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/15.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/14.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/13.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/12.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/11.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/10.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/9.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/8.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/7.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/6-1.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/5-2.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/4-1.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/3-1.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/2-2.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/1-2.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/8.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/7.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/6.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/5-1.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/4-1.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/3.gif
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/2-1.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/1-1.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/moodycroppedfeatured.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/4.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/1.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/3.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/2.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/worldPopFeatured.jpg
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/5.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/4.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/3.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/2.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/1.png
12:10:38 AM: warning Failed to process remote content https://maryswebsite.net/wp-content/uploads/2019/05/beirutfeatured.png
12:10:38 AM: success source and transform nodes β 39.017 s
12:10:39 AM: success building schema β 0.675 s
12:10:39 AM: success createPages β 0.139 s
12:10:39 AM: success createPagesStatefully β 0.044 s
12:10:39 AM: success onPreExtractQueries β 0.003 s
12:10:39 AM: success update schema β 0.049 s
12:10:39 AM: success extract queries from components β 0.246 s
12:10:41 AM: success run static queries β 1.726 s β 8/8 4.64 queries/second
12:10:42 AM: success run page queries β 0.999 s β 26/26 26.05 queries/second
12:10:42 AM: success write out page data β 0.090 s
12:10:42 AM: success write out redirect data β 0.009 s
12:10:56 AM: success Build manifest and related icons β 0.267 s
12:10:56 AM: success onPostBootstrap β 0.269 s
12:10:56 AM: info bootstrap finished - 60.832 s
12:11:12 AM: success Building production JavaScript and CSS bundles β 15.545 s
12:11:14 AM: error Building static HTML failed for path "/work/contes-feeriques"
12:11:14 AM: See our docs page on debugging HTML builds for help https://gatsby.dev/debug-html
12:11:14 AM: 61 |
12:11:14 AM: 62 | {fourItems.map(({ node }) => (
12:11:14 AM: > 63 |
12:11:14 AM: | ^
12:11:14 AM: 64 | ))}
12:11:14 AM: 65 |
12:11:14 AM: 66 |
12:11:14 AM:
12:11:14 AM: WebpackError: TypeError: Cannot read property 'childImageSharp' of null
12:11:14 AM:
12:11:14 AM: - footer.js:63
12:11:14 AM: lib/src/components/footer.js:63:95
12:11:14 AM:
12:11:14 AM:
12:11:14 AM: - footer.js:62 render
12:11:14 AM: lib/src/components/footer.js:62:24
12:11:14 AM:
12:11:14 AM: - gatsby-browser-entry.js:24 Object.children
12:11:14 AM: lib/.cache/gatsby-browser-entry.js:24:11
12:11:14 AM:
12:11:14 AM: - bootstrap:33 a.render
12:11:14 AM: lib/webpack/bootstrap:33:1
12:11:14 AM: failed during stage 'building site': Build script returned non-zero exit code: 1
12:11:14 AM:
12:11:14 AM: - bootstrap:30 a.read
12:11:14 AM: lib/webpack/bootstrap:30:1
12:11:14 AM:
12:11:14 AM: Shutting down logging, 30 messages pending
I have found this issue to appear when I'm far away from my wifi AP - when I'm right next to it, things are fine - when I'm on the other side of the house: bingo.
Hmm, thatβs interesting as Iβm currently traveling and dealing with slow internet. I will try once Iβm home and have a faster connection. Though the problem is happening on netlify as well as my local machine so Iβm not sure if that is the issue
I'm on a symmetric 1 Gpbs line at the office and there are issues mentioned about.
The above WiFi related comments illustrate the same issues. If promises were resolved correctly in the call-stack, regardless of the download speeds, the process of remote file retrieval would be successful.
I'll try and put together a PR if I have time this week to fix this.
@jacobsilver2 It could be the same related issue. Try coping the whole gatsby-source-wordpress code from the node_modules projects gatsby-node.js file and amend the code which downloads the files (similar to mine above) in a for-loop that respects promises being resolved before looping to the next item in the loop. You can put that code in your own projects main gatsby-node.js file.
PR #13943 created
@KyleAMathews Is there any chance that you could take a look at this PR for me and let me know your thoughts etc?
Thank you @bakeruk ! I'll see if I can give this a shot. Thanks for pointing me in the right direction. Much appreciated, and hopefully this can get sorted out soon.
Hi, I am having same issue with remote file downloads. But I am building my own plugin, and not using any existing plugins. It looks like it works fine when I have about 40 images, but when I started downloading over 300 images, it failed at random images. It seems to me, createRemoteFileNode is not downloading each image immediately, and it queues up the images and then it started downloading images from the queue immediately. Anyway, here is my code.
try{
const pages = node.content.pages
pages.forEach( async (page,index) => {
let blocks = page.blocks
blocks.forEach( async (block,index)=>{
if(block.image.url){
let s3URL = block.image.url
console.log('\r\nDownloading block image...\r\n')
const fileNode = await createRemoteFileNode({
url: s3URL,
store,
cache,
createNode,
createNodeId,
})
/* not going to create any node field yet, since downloading is not working...
createNodeField({
node,
name: 'localImage___NODE',
value: "true",
})*/
}
})
})
} catch (err) {
//console.log(err)
}
The reason I am doing two foreach loops is - I have multiple pages in each article, and each page has multiple images.
Thanks.
Try swapping out your forEach loops for a forIn loop.
A forEach loop doesn't honour a promise before it continues on to the next item in the loop.
For example,
for (const i in pages) {
if (pages[i]) {
const page = pages[i];
...
}
}
Also try adding a trycatch into your code around the createRemoteFileNode function to see if you can see any uncaught errors.
try {
const fileNode = await createRemoteFileNode({
url: s3URL,
store,
cache,
createNode,
createNodeId,
});
} catch (err) {
console.log(err);
// Optional depending how you want to proceed
throw err;
}
Thanks for your code. Now, the codes are working much better. I am still seeing some images fail at random and I think that's because my images are on Amazon S3, and S3 may be preventing to stop so many downloads at the same time. Mostly, while downloading is still in progress, I saw it paused at "source and transform nodes", and then I saw the next image failed. Not sure how to handle that yet, but it may be just S3.
By the way, I know it's off topic, but let me slip a question in here. Since I have so many images, how do I create the sub-nodes under the related original node.
I am trying to download remote images, then create a sub node under original image node (which has absolute image url), then use GraphQL and Gatsby's Image tag.
Thank you!
My GraphQL is:
{
allPostData(limit: 2) {
edges {
node {
fields {
localImage90_0_0 { /*<= that is messy*/
publicURL
}
localImage90_0_1 { /*<= that is messy again, can't keep adding like this */
publicURL
}
}
title
description
content {
author {
firstname
lastname
}
pages {
index
sub_heading
blocks {
index
text
image { /*<= can we create sub node here under "image" using createRemoteFileNode?*/
caption
url /*<= this is absolute url to s3 image*/
}
}
}
}
}
}
}
}
Actually, I just got it now. I needed to use parentNodeId in createRemoteFileNode. Then, when the downloaded file node is created, I just created another sub node.
Thank you.
try{
fileNode = await createRemoteFileNode({
url: s3URL,
store,
cache,
createNode,
createNodeId,
parentNodeId:block.id
})
}catch(err){
console.log("error createRemoteFileNode "+ err)
}
if(fileNode!=null){
block.imageLocal___NODE = fileNode.id
}
Hey @bakeruk , sorry to bother you again, but I'm having no luck with this....I don't have tons of experience messing around with this sort of thing, so I'm not even sure exactly where in the code I need to insert the for loop. It seems to me the gatsby-source-wordpress files are significantly different than gatsby-source-drupal. If you get a second, would you mind taking a look? My guess is that the logic is either in gatsby-node.js or normalize.js . Either around line 45 in gatsby-node.js or line 429 in normalize.js . I'm attaching both files. If you don't have time to take a look, no worries, thought I'd ask. Thanks!
//gatsby-node.js
"use strict";
var _interopRequireDefault = require("@babel/runtime/helpers/interopRequireDefault");
var _asyncToGenerator2 = _interopRequireDefault(require("@babel/runtime/helpers/asyncToGenerator"));
const fetch = require(./fetch);
const normalize = require(./normalize);
const normalizeBaseUrl = require(./normalize-base-url);
const typePrefix = wordpress__;
const refactoredEntityTypes = {
post: ${typePrefix}POST,
page: ${typePrefix}PAGE,
tag: ${typePrefix}TAG,
category: ${typePrefix}CATEGORY
/* If true, will output many console logs. */
};
let _verbose;
let _siteURL;
let _useACF = true;
let _acfOptionPageIds;
let _hostingWPCOM;
let _auth;
let _perPage;
let _concurrentRequests;
let _includedRoutes;
let _excludedRoutes;
let _normalizer;
exports.sourceNodes =
/#__PURE__/
function () {
var _ref = (0, _asyncToGenerator2.default)(function* ({
actions,
getNode,
store,
cache,
createNodeId,
createContentDigest
}, {
baseUrl,
protocol,
hostingWPCOM,
useACF = true,
acfOptionPageIds = [],
auth = {},
verboseOutput,
perPage = 100,
searchAndReplaceContentUrls = {},
concurrentRequests = 1,
includedRoutes = [**],
excludedRoutes = [],
normalizer
}) {
const createNode = actions.createNode,
touchNode = actions.touchNode;
const normalizedBaseUrl = normalizeBaseUrl(baseUrl);
_verbose = verboseOutput;
_siteURL = ${protocol}://${normalizedBaseUrl};
_useACF = useACF;
_acfOptionPageIds = acfOptionPageIds;
_hostingWPCOM = hostingWPCOM;
_auth = auth;
_perPage = perPage;
_concurrentRequests = concurrentRequests;
_includedRoutes = includedRoutes;
_excludedRoutes = excludedRoutes;
_normalizer = normalizer;
let entities = yield fetch({
baseUrl,
_verbose,
_siteURL,
_useACF,
_acfOptionPageIds,
_hostingWPCOM,
_auth,
_perPage,
_concurrentRequests,
_includedRoutes,
_excludedRoutes,
typePrefix,
refactoredEntityTypes
}); // Normalize data & create nodes
// Create fake wordpressId form element who done have any in the database
entities = normalize.generateFakeWordpressId(entities); // Remove ACF key if it's not an object, combine ACF Options
entities = normalize.normalizeACF(entities); // Combine ACF Option Data entities into one but split by IDs + options
entities = normalize.combineACF(entities); // Creates entities from object collections of entities
entities = normalize.normalizeEntities(entities); // Standardizes ids & cleans keys
entities = normalize.standardizeKeys(entities); // Converts to use only GMT dates
entities = normalize.standardizeDates(entities); // Lifts all "rendered" fields to top-level.
entities = normalize.liftRenderedField(entities); // Exclude entities of unknown shape
entities = normalize.excludeUnknownEntities(entities); // Creates Gatsby IDs for each entity
entities = normalize.createGatsbyIds(createNodeId, entities, _siteURL); // Creates links between authors and user entities
entities = normalize.mapAuthorsToUsers(entities); // Creates links between posts and tags/categories.
entities = normalize.mapPostsToTagsCategories(entities); // Creates links between tags/categories and taxonomies.
entities = normalize.mapTagsCategoriesToTaxonomies(entities); // Creates links from entities to media nodes
entities = normalize.mapEntitiesToMedia(entities); // Downloads media files and removes "sizes" data as useless in Gatsby context.
entities = yield normalize.downloadMediaFiles({
entities,
store,
cache,
createNode,
createNodeId,
touchNode,
getNode,
_auth
}); // Creates links between elements and parent element.
entities = normalize.mapElementsToParent(entities); // Search and replace Content Urls
entities = normalize.searchReplaceContentUrls({
entities,
searchAndReplaceContentUrls
});
entities = normalize.mapPolylangTranslations(entities);
entities = normalize.createUrlPathsFromLinks(entities); // apply custom normalizer
if (typeof _normalizer === `function`) {
entities = _normalizer({
entities,
store,
cache,
createNode,
createNodeId,
touchNode,
getNode,
typePrefix,
refactoredEntityTypes,
baseUrl,
protocol,
_siteURL,
hostingWPCOM,
useACF,
acfOptionPageIds,
auth,
verboseOutput,
perPage,
searchAndReplaceContentUrls,
concurrentRequests,
excludedRoutes
});
} // creates nodes for each entry
normalize.createNodesFromEntities({
entities,
createNode,
createContentDigest
});
return;
});
return function (_x, _x2) {
return _ref.apply(this, arguments);
};
}();
//normalize.js
"use strict";
var _interopRequireDefault = require("@babel/runtime/helpers/interopRequireDefault");
var _objectWithoutPropertiesLoose2 = _interopRequireDefault(require("@babel/runtime/helpers/objectWithoutPropertiesLoose"));
var _asyncToGenerator2 = _interopRequireDefault(require("@babel/runtime/helpers/asyncToGenerator"));
const deepMapKeys = require(deep-map-keys);
const _ = require(lodash);
const _require = require(gatsby-source-filesystem),
createRemoteFileNode = _require.createRemoteFileNode;
const _require2 = require(url),
URL = _require2.URL;
const colorized = require(./output-color);
const conflictFieldPrefix = wordpress_; // restrictedNodeFields from here https://www.gatsbyjs.org/docs/node-interface/
const restrictedNodeFields = [id, children, parent, fields, internal];
/**
function getValidKey({
key,
verbose = false
}) {
let nkey = String(key);
const NAME_RX = /^[_a-zA-Z][_a-zA-Z0-9]*$/;
let changed = false; // Replace invalid characters
if (!NAME_RX.test(nkey)) {
changed = true;
nkey = nkey.replace(/-|__|:|.|\s/g, _);
} // Prefix if first character isn't a letter.
if (!NAME_RX.test(nkey.slice(0, 1))) {
changed = true;
nkey = ${conflictFieldPrefix}${nkey};
}
if (restrictedNodeFields.includes(nkey)) {
changed = true;
nkey = ${conflictFieldPrefix}${nkey}.replace(/-|__|:|.|\s/g, _);
}
if (changed && verbose) console.log(colorized.out(Object with key "${key}" breaks GraphQL naming convention. Renamed to "${nkey}", colorized.color.Font.FgRed));
return nkey;
}
exports.getValidKey = getValidKey; // Remove the ACF key from the response when it's not an object
const normalizeACF = entities => entities.map(e => {
if (!_.isPlainObject(e[acf])) {
delete e[acf];
}
return e;
});
exports.normalizeACF = normalizeACF; // Combine all ACF Option page data
exports.combineACF = function (entities) {
let acfOptionData = {}; // Map each ACF Options object keys/data to single object
_.forEach(entities.filter(e => e.__type === wordpress__acf_options), e => {
if (e[acf]) {
acfOptionData[e.__acfOptionPageId || options] = {};
Object.keys(e[acf]).map(k => acfOptionData[e.__acfOptionPageId || options][k] = e[acf][k]);
}
}); // Remove previous ACF Options objects (if any)
_.pullAll(entities, entities.filter(e => e.__type === wordpress__acf_options)); // Create single ACF Options object
entities.push({
acf: acfOptionData || false,
__type: wordpress__acf_options
});
return entities;
}; // Create wordpress_id if the entity don't have one
exports.generateFakeWordpressId = entities => entities.map(e => {
if (e.__type === wordpress__yoast_redirects) {
e.wordpress_id = ${e.origin}-${e.url}-${e.type};
}
return e;
}); // Create entities from the few the WordPress API returns as an object for presumably
// legacy reasons.
const normalizeEntities = entities => {
const mapType = e => Object.keys(e).filter(key => key !== __type).map(key => {
return Object.assign({
id: key
}, e[key], {
__type: e.__type
});
});
return entities.reduce((acc, e) => {
switch (e.__type) {
case wordpress__wp_types:
return acc.concat(mapType(e));
case `wordpress__wp_api_menus_menu_locations`:
return acc.concat(mapType(e));
case `wordpress__wp_statuses`:
return acc.concat(mapType(e));
case `wordpress__wp_taxonomies`:
return acc.concat(mapType(e));
case `wordpress__acf_options`:
return acc.concat(mapType(e));
default:
return acc.concat(e);
}
}, []);
};
exports.normalizeEntities = normalizeEntities; // Standardize ids + make sure keys are valid.
exports.standardizeKeys = entities => entities.map(e => deepMapKeys(e, key => key === ID ? getValidKey({
key: id
}) : getValidKey({
key
}))); // Standardize dates on ISO 8601 version.
exports.standardizeDates = entities => entities.map(e => {
Object.keys(e).forEach(key => {
if (e[${key}_gmt]) {
e[key] = new Date(e[${key}_gmt] + z).toJSON();
delete e[${key}_gmt];
}
});
return e;
}); // Lift "rendered" fields to top-level
exports.liftRenderedField = entities => entities.map(e => {
Object.keys(e).forEach(key => {
const value = e[key];
if (_.isObject(value) && _.isString(value.rendered)) {
e[key] = value.rendered;
}
});
return e;
}); // Exclude entities of unknown shape
// Assume all entities contain a wordpress_id,
// except for whitelisted type wp_settings and the site_metadata
exports.excludeUnknownEntities = entities => entities.filter(e => e.wordpress_id || e.__type === wordpress__wp_settings || e.__type === wordpress__site_metadata); // Excluding entities without ID, or WP Settings
// Create node ID from known entities
// excludeUnknownEntities whitelisted types don't contain a wordpress_id
// we create the node ID based upon type if the wordpress_id doesn't exist
exports.createGatsbyIds = (createNodeId, entities, _siteURL) => entities.map(e => {
if (e.wordpress_id) {
e.id = createNodeId(${e.__type}-${e.wordpress_id.toString()}-${_siteURL});
} else {
e.id = createNodeId(${e.__type}-${_siteURL});
}
return e;
}); // Build foreign reference map.
exports.mapTypes = entities => {
const groups = _.groupBy(entities, e => e.__type);
for (let groupId in groups) {
groups[groupId] = groups[groupId].map(e => {
return {
wordpress_id: e.wordpress_id,
id: e.id
};
});
}
return groups;
};
exports.mapAuthorsToUsers = entities => {
const users = entities.filter(e => e.__type === wordpress__wp_users);
return entities.map(e => {
if (users.length && e.author) {
// Find the user
const user = users.find(u => u.wordpress_id === e.author);
if (user) {
e.author___NODE = user.id; // Add a link to the user to the entity.
if (!user.all_authored_entities___NODE) {
user.all_authored_entities___NODE = [];
}
user.all_authored_entities___NODE.push(e.id);
if (!user[`authored_${e.__type}___NODE`]) {
user[`authored_${e.__type}___NODE`] = [];
}
user[`authored_${e.__type}___NODE`].push(e.id);
delete e.author;
}
}
return e;
});
};
exports.mapPostsToTagsCategories = entities => {
const categoryTypes = [wordpress__wc_categories, wordpress__CATEGORY];
const tagTypes = [wordpress__TAG, wordpress__wc_tags];
const tags = entities.filter(e => tagTypes.includes(e.__type));
const categories = entities.filter(e => categoryTypes.includes(e.__type));
return entities.map(e => {
// Replace tags & categories with links to their nodes.
let entityHasTags = e.tags && Array.isArray(e.tags) && e.tags.length;
if (tags.length && entityHasTags) {
e.tags___NODE = e.tags.map(t => tags.find(tObj => (Number.isInteger(t) ? t : t.wordpress_id) === tObj.wordpress_id).id);
delete e.tags;
}
let entityHasCategories = e.categories && Array.isArray(e.categories) && e.categories.length;
if (categories.length && entityHasCategories) {
e.categories___NODE = e.categories.map(c => categories.find(cObj => (Number.isInteger(c) ? c : c.wordpress_id) === cObj.wordpress_id).id);
delete e.categories;
}
return e;
});
}; // TODO generalize this for all taxonomy types.
exports.mapTagsCategoriesToTaxonomies = entities => entities.map(e => {
// Where should api_menus stuff link to?
if (e.taxonomy && e.__type !== wordpress__wp_api_menus_menus) {
// Replace taxonomy with a link to the taxonomy node.
const taxonomyNode = entities.find(t => t.wordpress_id === e.taxonomy);
if (taxonomyNode) {
e.taxonomy___NODE = taxonomyNode.id;
delete e.taxonomy;
}
}
return e;
});
exports.mapElementsToParent = entities => entities.map(e => {
if (e.wordpress_parent) {
// Create parent_element with a link to the parent node of type.
const parentElement = entities.find(t => t.wordpress_id === e.wordpress_parent && t.__type === e.__type);
if (parentElement) {
e.parent_element___NODE = parentElement.id;
}
}
return e;
});
exports.mapPolylangTranslations = entities => entities.map(entity => {
if (entity.polylang_translations) {
entity.polylang_translations___NODE = entity.polylang_translations.map(translation => entities.find(t => t.wordpress_id === translation.wordpress_id && entity.__type === t.__type).id);
delete entity.polylang_translations;
}
return entity;
});
exports.searchReplaceContentUrls = function ({
entities,
searchAndReplaceContentUrls
}) {
if (!_.isPlainObject(searchAndReplaceContentUrls) || !_.has(searchAndReplaceContentUrls, sourceUrl) || !_.has(searchAndReplaceContentUrls, replacementUrl) || typeof searchAndReplaceContentUrls.sourceUrl !== string || typeof searchAndReplaceContentUrls.replacementUrl !== string) {
return entities;
}
const sourceUrl = searchAndReplaceContentUrls.sourceUrl,
replacementUrl = searchAndReplaceContentUrls.replacementUrl;
const _blacklist = [_links, __type];
const blacklistProperties = function blacklistProperties(obj = {}, blacklist = []) {
for (var i = 0; i < blacklist.length; i++) {
delete obj[blacklist[i]];
}
return obj;
};
return entities.map(function (entity) {
const original = Object.assign({}, entity);
try {
var whiteList = blacklistProperties(entity, _blacklist);
var replaceable = JSON.stringify(whiteList);
var replaced = replaceable.replace(new RegExp(sourceUrl, `g`), replacementUrl);
var parsed = JSON.parse(replaced);
} catch (e) {
console.log(colorized.out(e.message, colorized.color.Font.FgRed));
return original;
}
return _.defaultsDeep(parsed, original);
});
};
exports.mapEntitiesToMedia = entities => {
const media = entities.filter(e => e.__type === wordpress__wp_media);
return entities.map(e => {
// Map featured_media to its media node
// Check if it's value of ACF Image field, that has 'Return value' set to
// 'Image Object' ( https://www.advancedcustomfields.com/resources/image/ )
const isPhotoObject = field => _.isObject(field) && field.wordpress_id && field.url && field.width && field.height ? true : false;
const isURL = value => _.isString(value) && value.startsWith(`http`);
const isMediaUrlAlreadyProcessed = key => key == `source_url`;
const isFeaturedMedia = (value, key) => (_.isNumber(value) || _.isBoolean(value)) && key === `featured_media`; // ACF Gallery and similarly shaped arrays
const isArrayOfPhotoObject = field => _.isArray(field) && field.length > 0 && isPhotoObject(field[0]);
const getMediaItemID = mediaItem => mediaItem ? mediaItem.id : null; // Try to get media node from value:
// - special case - check if key is featured_media and value is photo ID
// - check if value is media url
// - check if value is ACF Image Object
// - check if value is ACF Gallery
const getMediaFromValue = (value, key) => {
if (isFeaturedMedia(value, key)) {
return {
mediaNodeID: _.isNumber(value) ? getMediaItemID(media.find(m => m.wordpress_id === value)) : null,
deleteField: true
};
} else if (isURL(value) && !isMediaUrlAlreadyProcessed(key)) {
const mediaNodeID = getMediaItemID(media.find(m => m.source_url === value));
return {
mediaNodeID,
deleteField: !!mediaNodeID
};
} else if (isPhotoObject(value)) {
const mediaNodeID = getMediaItemID(media.find(m => m.source_url === value.url));
return {
mediaNodeID,
deleteField: !!mediaNodeID
};
} else if (isArrayOfPhotoObject(value)) {
return {
mediaNodeID: value.map(item => getMediaFromValue(item, key).mediaNodeID).filter(id => id !== null),
deleteField: true
};
}
return {
mediaNodeID: null,
deleteField: false
};
};
const replaceFieldsInObject = object => {
let deletedAllFields = true;
_.each(object, (value, key) => {
const _getMediaFromValue = getMediaFromValue(value, key),
mediaNodeID = _getMediaFromValue.mediaNodeID,
deleteField = _getMediaFromValue.deleteField;
if (mediaNodeID) {
object[`${key}___NODE`] = mediaNodeID;
}
if (deleteField) {
delete object[key]; // We found photo node (even if it has no image),
// We can end processing this path
return;
} else {
deletedAllFields = false;
}
if (_.isArray(value)) {
value.forEach(v => replaceFieldsInObject(v));
} else if (_.isObject(value)) {
replaceFieldsInObject(value);
}
}); // Deleting fields and replacing them with links to different nodes
// can cause build errors if object will have only linked properites:
// https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/schema/infer-graphql-input-fields.js#L205
// Hacky workaround:
// Adding dummy field with concrete value (not link) fixes build
if (deletedAllFields && object && _.isObject(object)) {
object[`dummy`] = true;
}
};
replaceFieldsInObject(e);
return e;
});
}; // Downloads media files and removes "sizes" data as useless in Gatsby context.
exports.downloadMediaFiles =
/#__PURE__/
function () {
var _ref = (0, _asyncToGenerator2.default)(function* ({
entities,
store,
cache,
createNode,
createNodeId,
touchNode,
getNode,
_auth
}) {
return Promise.all(entities.map(
/#__PURE__/
function () {
var _ref2 = (0, _asyncToGenerator2.default)(function* (e) {
let fileNodeID;
if (e.__type === `wordpress__wp_media`) {
const mediaDataCacheKey = `wordpress-media-${e.wordpress_id}`;
const cacheMediaData = yield cache.get(mediaDataCacheKey); // If we have cached media data and it wasn't modified, reuse
// previously created file node to not try to redownload
if (cacheMediaData && e.modified === cacheMediaData.modified) {
const fileNode = getNode(cacheMediaData.fileNodeID); // check if node still exists in cache
// it could be removed if image was made private
if (fileNode) {
fileNodeID = cacheMediaData.fileNodeID;
touchNode({
nodeId: fileNodeID
});
}
} // If we don't have cached data, download the file
if (!fileNodeID) {
try {
const fileNode = yield createRemoteFileNode({
url: e.source_url,
store,
cache,
createNode,
createNodeId,
parentNodeId: e.id,
auth: _auth
});
if (fileNode) {
fileNodeID = fileNode.id;
yield cache.set(mediaDataCacheKey, {
fileNodeID,
modified: e.modified
});
}
} catch (e) {// Ignore
}
}
}
if (fileNodeID) {
e.localFile___NODE = fileNodeID;
delete e.media_details.sizes;
}
return e;
});
return function (_x2) {
return _ref2.apply(this, arguments);
};
}()));
});
return function (_x) {
return _ref.apply(this, arguments);
};
}();
const prepareACFChildNodes = (obj, entityId, topLevelIndex, type, children, childrenNodes, createContentDigest) => {
// Replace any child arrays with pointers to nodes
_.each(obj, (value, key) => {
if (_.isArray(value) && value[0] && value[0].acf_fc_layout) {
obj[${key}___NODE] = value.map((v, indexItem) => prepareACFChildNodes(v, ${entityId}_${indexItem}, topLevelIndex, type + key, children, childrenNodes, createContentDigest).id);
delete obj[key];
}
});
const acfChildNode = Object.assign({}, obj, {
id: entityId + topLevelIndex + type,
parent: entityId,
children: [],
internal: {
type,
contentDigest: createContentDigest(obj)
}
});
children.push(acfChildNode.id); // We recursively handle children nodes first, so we need
// to make sure parent nodes will be before their children.
// So let's use unshift to put nodes in the beginning.
childrenNodes.unshift(acfChildNode);
return acfChildNode;
};
exports.createNodesFromEntities = ({
entities,
createNode,
createContentDigest
}) => {
entities.forEach(e => {
// Create subnodes for ACF Flexible layouts
let __type = e.__type,
entity = (0, _objectWithoutPropertiesLoose2.default)(e, ["__type"]); // eslint-disable-line no-unused-vars
let children = [];
let childrenNodes = [];
if (entity.acf) {
_.each(entity.acf, (value, key) => {
if (_.isArray(value) && value[0] && value[0].acf_fc_layout) {
entity.acf[`${key}_${entity.type}___NODE`] = entity.acf[key].map((f, i) => {
const type = `WordPressAcf_${f.acf_fc_layout}`;
delete f.acf_fc_layout;
const acfChildNode = prepareACFChildNodes(f, entity.id + i, key, type, children, childrenNodes, createContentDigest);
return acfChildNode.id;
});
delete entity.acf[key];
}
});
}
let node = Object.assign({}, entity, {
children,
parent: null,
internal: {
type: e.__type,
contentDigest: createContentDigest(entity)
}
});
createNode(node);
childrenNodes.forEach(node => {
createNode(node);
});
});
};
exports.createUrlPathsFromLinks = entities => entities.map(e => {
if (e.link && !e.path) {
try {
const link = new URL(e.link);
e.path = link.pathname;
} catch (error) {
e.path = e.link;
}
}
return e;
});
We published a new version that should fix this issue. Please let me know if we did. Please thank @bakeruk for his awesome work!
Fantastic! This fixed my issue. Thank you @bakeruk for the fix and @wardpeet for publishing the new version. Very pleased with the turnaround time on this. π
@jacobsilver2 I have the same problem,but not solved. @wardpeet @bakeruk Can you give the detail change about the wordpress that can not process remote images.Thank you very much.
Yes, I also am still having this problem with Wordpress!
Noted.
Please keep all comms in #14173 for now as this is a closed issue.
I have had a look and put the details of my findings regarding your issues in that ticket ππΌ
Hi, I am still having this problem with Gatsby-source-drupal. I either get error: connect ETIMEDOUT or cannot retrieve a random image file asset. Are you sure you fixed this?
Try to change the concurrentFileRequests option to 1 and let me know if your still having issues.
Hi @bakeruk . I just realized that this has more to do with my server configuration than Gatsby. My IP address currently blocked because of too many requests. Thanks for the quick response
Most helpful comment
We are seeing the same issue happening with gatsby-source-wordpress. About 50% of the time our builds fail with seemingly no rhyme or reason and no code changes. Any advice would be greatly appreciated.