gatsby-source-plugin, what are some limitations? How much data is too much?

Created on 4 Mar 2019  路  4Comments  路  Source: gatsbyjs/gatsby

When creating a gatsby-source-plugin, what are some limitations?
What if I need to query a REST API and load 10k items? Does Gatsby keep all of them in memory to be able to expose them to graph API? Or is there some kind of "caching/pagination" done.

Can we have issues if we load too much data? How much is too much? Is this depending on the RAM/CPU of the computer running Gatsby?

question or discussion

All 4 comments

@danechitoaie great question(s)!

Let me answer each of these, specifically:

What if I need to query a REST API and load 10k items

This should be doable. It depends how you're structuring the plugin and whether you can optimize with some tricks, caching, etc., but 10K nodes should be doable.

Does Gatsby keep all of them in memory to be able to expose them to graph API

It does keep the nodes in a Redux state, which is stored in memory throughout the build process.

There are still some areas where Gatsby can be improved re: scaling, and loading less things in memory. We cache nodes and other data in a Redux store, which does live in memory for the build process. A bottleneck some have hit is that this redux store is serialized (in memory) to and from JSON, and this has led to issues with out of memory errors in the past.

Or is there some kind of "caching/pagination" done

We definitely cache these, yeah. You wouldn't request all 10,000 nodes data _every_ run, rather just on the first run and when a node becomes dirty. You'll want to implement caching yourself as part of this plugin so that on subsequent runs you only need to grab the minimal amount of data that's actually changed.

Can we have issues if we load too much data? How much is too much? Is this depending on the RAM/CPU of the computer running Gatsby?

Yes, to each of these, actually. There are some bottlenecks that we're working to resolve, but a typical benchmark is that a Gatsby app can struggle to complete a build around 50K or so pages. Nodes should be a little bit more resilient re: caching and the build process, but there can be some bottlenecks in the stringification and parsing, as I mentioned earlier.

It's hard to say exactly how much is too much, because it can vary on how much data is created and specific use cases re: to plugin development.

I _hope_ this was helpful, and I'm going to close this out as answered--please feel free to reply back if any of us can help further and we'll continue the discussion and/or re-open.

Thanks for using Gatsby 馃挏

@DSchau Any idea if any kind of improvements are planned in this regard? I'm thinking in trying to build an eCommerce "starter pack/kid/reference application" and I've seen cases where there were around 80k - 100k products, so from the start I'm looking in needing to generate at least this number of product pages, and on top of this there would be categories, maybe content pages from some blog, etc.

Any idea if Gatsby could be improved at some point to allow somehow to generate pages in a "stream mode" (or not sure how to put this)? Basically somehow where it would allow you not to load ALL data in memory for the build, but do it in chunks somehow.

Of course, yeah. We want to make Gatsby just as feasible a choice for a site with 10 nodes as a site with 100,000 nodes.

Working on it!

That being said--would still recommend building the plugin! It feels like we're prematurely making a decision based on possible scaling issues--until they're real, there's still a vast area of value between anyone using the plugin and people hitting an arbitrary scaling issue.

Hopefully that makes sense--we'd love to see the plugin and will continue to work on Gatsby even more scalable in the interim!

@DSchau yes, totally understand and agree that a bit part of the performance also consists in how the source plugin retrieves the data, caches it, etc.

I'm not familiar to internals of Gatsby but I'm thinking maybe at some point it can be implemented in a way that would allow to stream data vs loading it all into memory at once. And on top of this maybe also partition it somehow, like if you have a blog posts source plugin, and a products source, to lazy load what you need when you need.

You guys are the experts in Gatsby, I'm just trying to provide some ideas, which I'm sure you guys already thought of.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andykais picture andykais  路  3Comments

rossPatton picture rossPatton  路  3Comments

KyleAMathews picture KyleAMathews  路  3Comments

ghost picture ghost  路  3Comments

totsteps picture totsteps  路  3Comments