In a project of mine, I process quite a lot of external images at build time.
Something weird started happening one day. The same image would appear on one page, and then not on another.
The queried field for the processed image would return undefined. I use Gatsby Image by the way.
Here is a simplified version of how I create the remote files at build time:
const { createRemoteFileNode } = require("gatsby-source-filesystem");
exports.createSchemaCustomization = ({ actions }) => {
const { createTypes } = actions;
createTypes(`
type mongodbVimcsRepositories implements Node {
processed_images: [File]
}
`);
};
exports.onCreateNode = ({
node,
actions: { createNode },
store,
cache,
createNodeId,
}) => {
if (node.internal.type === "<type>") {
node.image_urls.forEach(async url => {
const fileNode = await createRemoteFileNode({
url,
parentNodeId: node.id,
createNode,
createNodeId,
cache,
store,
});
node.processed_images = [
...(node.processed_images || []),
fileNode,
];
});
}
};
// page creation and more unrelated stuff below
To understand a little more what was happening, I started logging the URL of the image after the createRemoteFileNode promise was resolved.
I noticed some of them logged a little late in the build process, and coincidentally (or not), they were the images missing in the app on certain pages.
In the screenshot below, all images between the two red lines do not get sent in the result of the page query.

Note that all the images are queried through page queries and not static queries.
If it can be of any help, here is how I query the images on the page:
export const query = graphql`
query(<params>) {
repository: <type>(
<query>
) {
<other fields>
images: processed_images {
childImageSharp {
fluid(maxWidth: 1280, quality: 80) {
...GatsbyImageSharpFluid
}
}
}
}
}
`;
`;
Not sure if this is an issue with Gatsby, or just how I create the remote nodes and the way I wait for the promises to resolve, or if it's an issue at all. Anyway, I would appreciate a little insight into this as I thought it was interesting.
I have found a way to work around this issue I've been having, but I'm not confident it's a good way to solve it.
What I did is accumulate the createRemoteFileNode promises in an array, and then await all the promises in the onPostBootstrap hook. Then, I would be sure all the nodes were finished processing before passing onto the build phase.
exports.onPostBootstrap = async () => {
await Promise.all(imagePromises);
};
I Will set up a basic repo if necessary, as it's not a super easy setup.
Images are all processed before the queries start to execute.
Some images finish processing late, resulting in missing images in some pages.
System:
OS: macOS 10.15.4
CPU: (12) x64 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
Shell: 5.7.1 - /bin/zsh
Binaries:
Node: 14.5.0 - /usr/local/bin/node
Yarn: 1.22.4 - /usr/local/bin/yarn
npm: 6.14.5 - /usr/local/bin/npm
Languages:
Python: 2.7.16 - /usr/bin/python
Browsers:
Chrome: 84.0.4147.105
Edge: 84.0.522.49
Firefox: 78.0.2
Safari: 13.1
npmPackages:
gatsby: ^2.24.23 => 2.24.23
gatsby-cli: ^2.12.68 => 2.12.68
gatsby-image: ^2.4.14 => 2.4.14
gatsby-plugin-manifest: ^2.4.21 => 2.4.21
gatsby-plugin-offline: ^3.2.21 => 3.2.21
gatsby-plugin-react-helmet: ^3.3.1 => 3.3.10
gatsby-plugin-robots-txt: ^1.5.1 => 1.5.1
gatsby-plugin-sass: ^2.3.1 => 2.3.12
gatsby-plugin-sharp: ^2.6.24 => 2.6.24
gatsby-source-filesystem: ^2.3.23 => 2.3.23
gatsby-source-mongodb: ^2.3.6 => 2.3.10
gatsby-transformer-json: ^2.4.1 => 2.4.11
gatsby-transformer-sharp: ^2.5.12 => 2.5.12
npmGlobalPackages:
gatsby-cli: 2.6.5
Some images finish processing late, resulting in missing images in some pages.
Did you look for those missing images in the ./cache folder location that the images were pulled to? In my experience, if remote downloads fail an image may be considered done and still processed (ends up being "corrupted" in that only some of the image data was retrieved and no retry or notification/warning about it not being properly completed), or you get the experience you mention with "undefined"/omitted results that should be returned in the query with valid URIs to processed images.
Those situations happened for me on bad quality connections, it also turns out that Gatsby defaults to 200 concurrent connections for some reason which exacerbates the problem.. There are Gatsby environment variables however that let you reduce the concurrent connections, that works well for me, there's also some others for extending timeout duration in case limiting concurrency is not enough.
@polarathene It happened very consistenly on a medium to high speed connection, so I don't think that is quite the issue.
Like in my workaround, waiting for the node creation promise to resolve fixes the problem. This means that the image just didn't have the time to process, and not that it failed at doing so.
I'll research this a little more this week, and create a repo to reproduce it.
Hi @reobin !
My initial thought is that onCreateNode must be async in your case, that is:
exports.onCreateNode = async ({
node,
actions: { createNode },
store,
cache,
createNodeId,
}) => {
// Hook contents
}
So please try that first. If that doesn't help then it is incredibly helpful if you're able to create a minimal reproduction. This is a simplified example of the issue that makes it clear and obvious what the issue is and how we can begin to debug it.
If you're up for it, we'd very much appreciate if you could provide a minimal reproduction and we'll be able to take another look.
Thanks for using Gatsby! 馃挏
Also here is a link to a relevant documentation section: https://www.gatsbyjs.org/docs/preprocessing-external-images/#gatsby-node
@vladar I haven't been able to reproduce this with a smaller repo, or simply fewer images.
Since I found a way to address it in my repo for now, let's close the issue and get back to it if something more concrete comes up.
Thanks!