Gatsby: Feature Request: Skip / Speedup Image Generation Processes

Created on 5 Jun 2020  路  28Comments  路  Source: gatsbyjs/gatsby

Summary

i would like to skip the step of creating image thumbnails and create "fake" thumbnails.

Basic example

  • i set some option to skip the thumbnail creation / image processing
  • it creates "Fake" imagesets, and i can query all sharp image options from GraphQL, but it always return the original image without changes

  • i don't remember if there was planned an api for different image manipulation providers (image sharp, external services, lambda functions, gatsby cloud) - but then it can set up with a fake image service...

Motivation

  • i running the gatsby develop command in the gatsby doc site.
  • in some times something needs to run gatsby clean and the thumbnails are not cached anymore and it needs recreated
  • in most times i do not care of the image quality while developing

Times:

1943.298 s 2678/4226 63% Generating image thumbnails

success run page queries - 1329.928s - 4622/4622 3.48/s
success Generating image thumbnails - 2196.493s - 4226/4226 1.92/s

about 36min :(

feature or enhancement

Most helpful comment

Looks like I created a duplicate here - https://github.com/gatsbyjs/gatsby/issues/25827

There's a bunch of options I considered for a project I work on.

I think there's two ideas that are worth pointing out:

1) Add a plugin option to gatsby-plugin-sharp to return original image src without doing the time consuming resizing when in development. This way my earlier mentioned GraphQL would still work, but every value would be the same unresized image src. It's a sort of by-pass without breaking the application.

It could look something like this:

plugins: [
  {
    resolve: `gatsby-plugin-sharp`,
    options: {
      skipProcessing: process.env.NODE_ENV !== 'production'
    },
  },
]

2) As suggested by @polarathene - have separate cache for images, so it doesn't get flushed with every update in package.json/yarn.lock and/or gatsby-config

All 28 comments

This would be so nice. It takes 4-5 minutes to start the our dev environment when someone first gets started, or a dependancy is updated, or something get's messed up with the cache and gatsby clean is required.

Removing gatsby-remark-images from a project's list of plugins is an okay workaround, you'll get alt text in the place of images. Obviously not ideal but I suppose helpful in a pinch.

Is the main issue with gatsby clean and wanting to preserve the image cache? Or just get the site up and running asap by skipping potentially expensive processes like image generation? I thought there was work on that for develop where it only generated on-demand instead of in bulk upfront.

When I was in China, it was an issue as downloading images from a remote resource outside of China was at very slow speeds for some reason and would sometimes drop connection corrupting images. I think I used Docker to work around that and just mount the image cache or something like that, which let me preserve that which took 10-30 minutes otherwise to handle.

a) preserve cache

Is the main issue with gatsby clean and wanting to preserve the image cache?

preserving image cache maybe would help, but if something is out of sync then this helps not so much

b) skip expensive process

Or just get the site up and running asap by skipping potentially expensive processes like image generation?

it is more like this, when you are working on components etc. where is no need to have optimized images. maybe in a later step when you have QA stage before going in production then the optimized images would needed

BTW: i can not find the issue with the description of the change of api (jobs api?) to allow "outsource" the image processing into different services...

This would also be super useful for CI pipelines, where generating image thumbnails is often wasted work.

One approach might be to create a Gatsby source plugin that returns fake image data when a flag is set.

suggest an API for some jobs/image processing, because this have more benefits and not only for some plugins:

then this can used to have for expensive processes:

  • distributed computing for parralel Tasks in the cloud / internal server farm
  • Fake/Mock Services

can services can changed with settings and no need to touch code

This a major problem for one project I'm working on. Every time I add or update an unrelated NPM package Gatsby blows away the image cache and I can't work for 2 hours (yes, literally, and on an 8 core i9 CPU) while it regenerates 80k image thumbnails. It's brutal. Our Gatsby Cloud builds can do it in about ~45 minutes by comparison.

For this use case, I can think of two approaches to mitigate this:

  1. Make the cache more resilient/reusable. Can we narrow down what changes necessitate blowing away the image cache?

  2. In 'develop' mode, could these be generated lazily at the point of need/access? Generating 80k images upfront that I almost certainly don't even need makes working with Gatsby locally incredibly painful.

There was work about a year or so ago on having gatsby develop skip full image generation and only do processing on demand based on content you loaded in the browser. No idea what happened with that though.

Images are cached based on parameters for processing the image with afaik, so if that hasn't changed, retaining the cached images probably would be fine.

One other solution would be to have a separate process/service handle the image processing that gatsby communicates to, then it can just provide the output that gatsby needs and copy the images over without having to pointlessly reprocess.

Especially sounds worthwhile to separate for a project with that amount of images and processing time.

gatsby-parallel-runner is available for setting up to offload image processing btw. Netlify has a blogpost about it, which pairs it with Google Cloud Platform. Gatsby Cloud leverages it as well afaik.

Speeding up image processing isn't the same as skipping image processing. The former is likely still going to cost you money for CPU cycles etc.

One thing I've found that works pretty well is to move images to an S3 bucket, using @robinmetral/gatsby-source-s3 to source images via GraphQL at build time, and then use an env flag in gatsby-config.js to choose between an empty S3 bucket or the one containing all the images. That allows you to skip image processing entirely, or reduce it to a smaller subset, etc.

There was work about a year or so ago on having gatsby develop skip full image generation and only do processing on demand based on content you loaded in the browser. No idea what happened with that though.

here are questions about lazyImageGeneration:

  • #10964 refactor(gatsby-plugin-sharp): split single file into more maintainable chunks

gatsby-parallel-runner is available for setting up to offload image processing btw. Netlify has a blogpost about it, which pairs it with Google Cloud Platform. Gatsby Cloud leverages it as well afaik.

also this blog post is linked in the netlify post: https://dev.to/biilmann/open-source-parallel-processing-for-gatsby-270d

yes - that was that what i mean with image api:

  • #20835 feat(gatsby): enable external jobs with ipc

maybe it is possible to have a gatsby-parallel-fake-runner which catch the image processing and return unchanged images with sourcemaps or something like the image sharp returns and pointing to the same image. because the images are not processed it should be fast

This would be great, working with an image heavy site whenever the cache gets cleared it means 30 minutes or more of hanging out and waiting for it to complete.

a fake runner would be great as an option here. I actually was trying to get parallel runner to work for develop, even if it means a couple cents in cloud costs to have google functions make the images, that would be preferable to waiting most of the time.

This would be very very helpful. When working on a large image-heavy site, most of the time I am working on stuff not related to images. But because of the build times where most of the time is spend generating images, the whole process is slowed down and it becomes harder and harder to work and test things in short cycles.

Being able to set a flag for gatsby to skip all image processing and just return linked images as is would be epic.

Looks like I created a duplicate here - https://github.com/gatsbyjs/gatsby/issues/25827

There's a bunch of options I considered for a project I work on.

I think there's two ideas that are worth pointing out:

1) Add a plugin option to gatsby-plugin-sharp to return original image src without doing the time consuming resizing when in development. This way my earlier mentioned GraphQL would still work, but every value would be the same unresized image src. It's a sort of by-pass without breaking the application.

It could look something like this:

plugins: [
  {
    resolve: `gatsby-plugin-sharp`,
    options: {
      skipProcessing: process.env.NODE_ENV !== 'production'
    },
  },
]

2) As suggested by @polarathene - have separate cache for images, so it doesn't get flushed with every update in package.json/yarn.lock and/or gatsby-config

@josephmarkus good ideas

Hiya!

This issue has gone quiet. Spooky quiet. 馃懟

We get a lot of issues, so we currently close issues after 30 days of inactivity. It鈥檚 been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 馃挭馃挏

Generating image thumbnails is still the biggest pain for images with a lot of sites. 30 minute builds are becoming the new "my code is compiling" excuse for us developers to go get coffee. It defeats the purpose of the rapid iteration build previews that jam stack sites are known for.

Please make this happen. It is really painful to wait for 20-30 min to generate image thumbnails. In my case the generating images is bloated because of the Fluid images. I resolved this by passing one break point to my graphql query where bigger images where not required. Maybe I should use a fixed image here:

childImageSharp {
    fluid(maxWidth: 800, quality: 65, srcSetBreakpoints: [ 800 ]) {
...

Also as I am using gatsby-transformer-remark I reduced the breakpoints there as well, from the default srcSetBreakpoints: [ 200, 340, 520, 890 ] to only 3 breakpoints:

        {
            resolve: `gatsby-transformer-remark`,
            options: {
                plugins: [
                    {
                        resolve: `gatsby-remark-images`,
                        options: {
                            quality: 60,
                            linkImagesToOriginal: false,
                            srcSetBreakpoints: [ 340, 520, 890 ]
                        },
                    },
                ],
            },
        },

This reduced my images being generated from 2400 to 1400 which is better.

I am wondering if I can ditch the gatsby-plugin-sharp plugin and use Cloudflare for the image optimization and resizing as the thumbnail generation is really annoying:
https://blog.cloudflare.com/announcing-cloudflare-image-resizing-simplifying-optimal-image-delivery/

Would give anything to have relative images in markdown without all the cruft that comes along with gatsby-remark-images.

Hiya!

This issue has gone quiet. Spooky quiet. 馃懟

We get a lot of issues, so we currently close issues after 60 days of inactivity. It鈥檚 been at least 20 days since the last update here.
If we missed this issue or if you want to keep it open, please reply here.
As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 馃挭馃挏

not stale. please make this happen 馃槙

Please keep this issue open. I would really like to be able to skip the image processing step when developing.

I've been using a clumsy workaround by disabling gatsby-remark-images with a dev:fast script. This speeds things up a ton. It can come in handy if your work has nothing to do with images/page layout.

It reminds me what developing Gatsby sites used to feel like haha.

@muescha Skipping unnecessary work is a great idea!

However, it would be cool if the thumbnail data could still be visible on the page. Eg. generate thumbnail once per image content, and retain this cache (similar to Conditional Page Builds (the GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES flag))

Would you be open to changing the request to make builds for images incremental? So if there is a change to an image, it will be built once, but all historical image builds will be cached?

I think this is a problem with the approach of image processing in Gatsby actually (eg. the Gatsby mantra of "just reprocess, all the time" and "parallelize the work if it's a lot of work" - see thread of tweet below).

In my opinion, a single image should never be rebuilt, so long as the content never changes.

https://twitter.com/karlhorky/status/1238137746646093825

We're currently working on this, I'm closing this one in favor of this discussion
https://github.com/gatsbyjs/gatsby/discussions/7348

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andykais picture andykais  路  3Comments

ghost picture ghost  路  3Comments

timbrandin picture timbrandin  路  3Comments

KyleAMathews picture KyleAMathews  路  3Comments

hobochild picture hobochild  路  3Comments