Hello,
Sorry if this question was already answered, but I wasn't able to find any relevant info.
Let's say I have an e-shop with products. I'd like to have all pages with products properly indexed by search engines.
All product data will be taken from a database (or GraphQL/REST API).
I want to trigger gatsby build when the product-related data changes. I guess this would be perfect fit for a lambda function.
The build pipeline would like this:
product data change -> gatsby build inside lambda function -> put the output on static hosting (S3 in case of AWS) -> (optionally) invalidate CDN cache
Example gist I've found that does this on AWS: https://gist.github.com/digitalkaoz/94933c246ba67032a1507083e2605a30
So, my questions are:
1.) Does gatsby have any recommendations / documentation / best practices / blog posts about this kind of behavior?
2.) Is gatsby well suited for this kind of job? Or is it better to use server-side rendering (next.js) for such use-cases?
3.) Does gastby offer an official programmatic build API, or should I stick with CLI API?
4.) Are there any known limitations?
Thank you!
Edit: I know I can use client-side only routes (SPA) for this, but I want proper SEO.
@mcongy thanks for this question!
There are some complications here, and it appears that you've proposed a particularly novel solution here re: the virtual FS. One of the issues with lambda is that only /tmp is writable, so we _really_ like the solution to kind of patch fs in a clean way.
Would you be able to validate that that approach works in lambda? If you can validate--we'd _really_ love a PR adding some options, e.g. --build-path, --cache-path, etc. and then cleanly implementing using this fs rewrite approach.
Now, specifically, let me answer some of your questions!
Does gatsby have any recommendations / documentation / best practices / blog posts about this kind of behavior?
Not so much! The only (?) wrinkle as far as AWS Lambda is the siloing of write permissions to only the /tmp directory. Otherwise, things should _mostly_ just work 鈩笍 It's possible you may run into some dependency issues, e.g. sharp tends to be a little finicky, so let us know if you do!
Is gatsby well suited for this kind of job? Or is it better to use server-side rendering (next.js) for such use-cases?
This seems tangential to compare Next.js to Gatsby for this! Next.js is for server rendered (exclusively) applications, whereas AWS Lambda is being used here as a programatic build tool.
Does gastby offer an official programmatic build API, or should I stick with CLI API?
I'd recommend using the CLI if you can. If we ever (for some reason) change the directory structure, you'd be isolated from that change. In other words, something like:
const execa = require('execa')
// note: may have to use ./node_modules/.bin/gatsby
execa('gatsby', ['build'])
Are there any known limitations?
Seems like you've mostly resolved the _known_ limitations with the fs rewrite. If you run into anything else, we'd happily take a look.
Also - this seems fairly novel and interesting, whatever happens here--we'd love for you to circle back and let us know.
Going to close as answered! Thanks for the interesting question, and please stay in touch! 馃挏
@DSchau
Is gatsby well suited for this kind of job? Or is it better to use server-side rendering (next.js) for such use-cases?
This seems tangential to compare Next.js to Gatsby for this! Next.js is for server rendered (exclusively) applications, whereas AWS Lambda is being used here as a programatic build tool.
What I was asking is, if Gatsby is suitable for use-cases where dynamic, mutable content (e.g products) needs to be indexed for SEO, but has to stay up-to-date when data in database changes. Server-side/Isomorphic rendering is suitable for this job, but I don't like the added complexity and managing the server infrastructure. (That's why I compared the Gatsby to Next.js)
You can't achieve this with SPA (sort of can, but there are limitations for SEO), and the only way to achieve this using Gatsby (that I can think of) is to use the approach I was discussing above.
I'll try to play around with this approach. I'm not the author of the gist I posted above, but I will try something similar and post what I come up with.
What I was asking is, if Gatsby is suitable for use-cases where dynamic, mutable content (e.g products) needs to be indexed for SEO, but has to stay up-to-date when data in database changes. Server-side/Isomorphic rendering is suitable for this job, but I don't like the added complexity and managing the server infrastructure. (That's why I compared the Gatsby to Next.js)
Ah - sorry!
Then yes - your use case you provided is a perfectly reasonable workaround to the issue where you want to maximize SEO while still keeping product inventory/something dynamic up-to-date. We've oftentimes seen webhooks used for this (e.g. Shopify fires a webhook to Netlify, which triggers a re-build) but using Lambda as the trigger to deploy to S3 is a perfectly reasonable approach!
Another approach would be to make _some_ pages dynamic (e.g. request data when a product doesn't exist locally--e.g. in the schema) and then if it doesn't exist, make an API request. Then you could use a cron-like job to build out the content on some interval (e.g. every day at 2AM).
@mcongy
Hey does your approach work? I have the same needs and is imagining a similar approach. It would be great if you could share some experience about it. Thanks!
Interested in feedback also! thanks @mcongy
I'm interested too.
After this issue is fixed, it will be easier https://github.com/gatsbyjs/gatsby/issues/1878
AWS Lambda now allows mounting an EFS Drive 馃帀
https://aws.amazon.com/de/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/
Now it should be way easier to achieve this, without having to worry about running out of memory.
I have been working on a project that uses gatsby build inside a lambda. everything works great locally. the issue that i am running into is the size of node_modules. Out the box gatsby is almost 500mb. Lambda needs to be < 250mb.
How are people working around this? I am about to try using pnpm to see if that helps but it seems like wishful thinking to get it down by more than 50%.
I can't comment on specific AWS environment issues until I see how it builds, but I can't get serverless to deploy due to gatsby app being so large.
Any suggestions?
(edit - i should mention I am not using Netlify on this project, just AWS Lambda, S3, and Cloudfront).
As @scriptify mentions, you should technically be able to install Gatsby in EFS, and use that package within your Lambda. We're looking into doing something similar and this seems feasible.
https://aws.amazon.com/de/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/


I've just put together an overview of how to go about doing this:
https://www.jameshill.dev/articles/running-gatsby-within-aws-lambda
Most helpful comment
As @scriptify mentions, you should technically be able to install Gatsby in EFS, and use that package within your Lambda. We're looking into doing something similar and this seems feasible.
https://aws.amazon.com/de/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/