UPDATE: See summary of options in https://github.com/iterative/dvc.org/issues/992#issuecomment-587552490
This is a ticket to discuss and compare possible solutions, based on the criteria listed below.
We plan to convert dvc.org to gatsby and merge it with the blog. Right now, dvc.org is hosted on Heroku, and the blog is hosted on Netlify. We need to choose one hosting that will work for both of those services' needs.
What we want:
API endpoints
dvc.org and blog both use some custom API endpoints to fetch and transform data from github and discourse. We also need to be able to cache the results of these requests because they are not very fast and shouldn't be updated more often than once in 15 minutes.
Right now, this is implemented as Node.js server on Heroku with in-memory cache, but we wouldn't necessarily have a server after we migrate to gatsby. Also, our current implementation has a problem - for the first user that will try to access the page until the cache is created, load time can be quite significant (~10s). Ideally, we should perform cache update ourselves with something like cron and always sent cached results for all of our users.
To solve this, we can use Netlify functions, Cloudflare workers, or something else.
Redirects
We have a large list of redirects that we need to support https://github.com/iterative/dvc.org/blob/master/redirects-list.json
New hosting should allow it.
Build time
Our current build time for the blog on Netlify is long, 7m now, and will only get longer after we merge it with the main site. We can speed it up by preserving .cache dir, yarn module cache between builds. A new hosting/build option should allow us to preserve them.
Demo stands
Right now, both Netlify and Hero allow us to automatically created preview stands from github PRs. We want to have this functionality in the future too.
--
Our new hosting solution may be not one server, but a combination of the few different ones, e. g. CircleCI + Cloudflare Workers + Netlify/Now.sh, but it should be able to all of the things listed above.
@fabiosantoscode please, take a look. Let's discuss it here.
I agree with you, in that the holy grail is to have our API be just a bunch of static data updated by a cron. It's similar to the architecture I left behind at a paper, where events would wake up a lambda function which rendered a react page, and then push the result to S3. This was called the reactive manifesto back in the day.
However in our situation, where DX is key, I don't really want to write and maintain a bunch of scripts to support a per-PR environment and work differently in production. Ideally we don't have any code handling this. The reason is, that both websites are so static that it's not really worth it to even have a load balancer, and a single server with a memcached instance could certainly cut the mustard when you hide it behind cloudflare or another caching CDN.
I've talked with @shcheklein a lot about this a lot yesterday and one thing we touched on a lot of times was running workers at the edge. However I've given it more thought, and I don't think we need this capability if our API is made up of mostly static things. Instead, it can just slowly spread across the CDN and be served from the edge. It's OK even if it takes a few minutes to update. If it's not OK, we can invalidate the cache.
Heroku is pretty flexible. It doesn't have functions at the edge (that I know of), but I don't think we need them. It also has a nice DX.
My proposal:
I think we can use just heroku. We put a CDN in front of it for production, I don't think it matters which (cloudflare or cloudfront).
We stay away from edge functions or pushing to the CDN, and for simplicity allow it to fetch from our poor server. If we find that it is not enough, we can start pushing to the edge. Cloudflare allows for this with workers + KV, and cloudfront allows this by having an S3 bucket as an origin and pushing to it.
We will need a memory cache store. This is because we will need to preserve the content and etags we get from the Github API. The reason we store the etags as well is that if we do a request to github with an if-none-match: {etag} header, and the content is the same, we don't use up our rate limit.
There's no reason we can't serve our gatsby files from an express server. Zeit's serve-handler can handle that. And it removes trailing slashes to boot.
(req, res) => {
// do redirects if needed and return early
serve(req, res, serveOptions);
}
So this way, we can have a local development server (yarn dev runs nodemon server.js) and a production server (yarn start runs node server.js) which are the same server. We can also serve API requests from this server. PR environments will simply run this server without a CDN, and prod would have a CDN.
We will still have the flexibility of setting nice cache headers, which are read both by the browser (and cloudflare as well as your typical run-of-the-mill cache server or CDN). For example, public/static is filled with content-addressible files. We can set expires headers far into the future (or use cache-control: immutable) because they will never change. Static JS/CSS files too, since webpack places an MD4 hash into their filename.
I guess this is not in-scope, but just in case, for dynamic page content like comments, we have a few options:
@fabiosantoscode thanks, good summary and good points.
what about the CD part if we stick to Heroku - we need a fast way to build and deliver Gatsby to it.
What do you think about deploying to both - Heroku (APIs + cache) and Netlify (static stuff)?
Just a few comments from me for now:
On redirects, Netlify seems to have this built in 🙂 https://docs.netlify.com/routing/redirects/
this way, we can have a local development server (yarn dev runs nodemon server.js) and a production server
I think this is pretty important for the docs process (for local sanity checks). Demo stands also help but its faster, easier, and cheaper to run the site locally.
public/static is filled with content-addressible files. We can set expires headers far into the future (or use cache-control: immutable) because they will never change
MD files in public/static/docs/ are the ones that change the most. Very often. (But we don't serve them directly to users atm, the web app "proxies" them via react-markdown rendering).
edge-side includes, which are instructions to copy the string from another URL (or cloudflare KV) into the place they are found
ESI is meant more for personalization I think i.e. if we had user logins. If this is probable in the future road map then I think it's a good option to incorporate now.
In general I just have the impression we may be trying to use too many technologies? Could it all be plain Gatsby (whatever that means) or all microservices or all serverless functions + cdn?
@shcheklein
what about the CD part if we stick to Heroku - we need a fast way to build and deliver Gatsby to it.
What do you think about deploying to both - Heroku (APIs + cache) and Netlify (static stuff)?
I say we take advantage of their built-in offering. They also have a cache folder where we can cache the gatsby folder and the public folder, such that things like generating thumbnails is done incrementally. Remember that there will always be a build step. Locally (on an intel i7) a build on a hot cache takes 19 seconds.
The real bottleneck will be saving and restoring cache, so the more images we have, the slower the build will be. We can try to tackle this in the future when it becomes a problem. It will be complicated, yet unavoidable.
Deploying to both Netlify and Heroku would kind of kill the development experience of sending a PR and getting a nice link to a temporary environment: we would have to find a way to connect netlify environment to the heroku API. Additionally, netlify is running a node process anyways :)
@jorgeorpinel
In general I just have the impression we may be trying to use too many technologies? Could it all be plan Gatsby (whatever that means) or all microservices or all serverless functions + cdn?
You make a good point here. I had another look and found this middleware from gatsby. We can use it to embed gatsby's logic into our express server. This is pretty close to plain gatsby. It could get closer if there was a way to use middleware in production like it's possible in dev. They might not allow middleware in production because they mean for gatsby sites to be statically hosted.
I would like to argue though, that having a single node process with express do everything, and put a CDN on top of it for DDOS protection, speed and edge caching, shouldn't be too many technologies. Even if it's under the hood, this is mostly what we have now (gatsby serve is express, and netlify puts a cache in front of it, it's not a static server). We do need to run a gatsby build before we serve, but when the build is finished that's it.
After having written all of this, I think we have a real option to kick the can down the road by continuing to use netlify. Depends on the price of their cdn though.
We would still write an express server to serve gatsby and our APIs, but it wouldn't have any memory cache. It wouldn't be ideal, but I have no reason to believe it wouldn't be webscale.
MD files in public/static/docs/ are the ones that change the most. Very often. (But we don't serve them directly to users atm, the web app "proxies" them via react-markdown rendering).
@jorgeorpinel it's not the case with Gatsby - we serve pre-built static HTMLs that include processed MD in them
On redirects, Netlify seems to have this built in 🙂 https://docs.netlify.com/routing/redirects/
@jorgeorpinel I don't think that is flexible enough. I would prefer to keep the redirects logic that we have - including tests, etc.
We would still write an express server to serve gatsby and our APIs, but it wouldn't have any memory cache.
@fabiosantoscode for some reason I had an idea that it's not possible with Netlify - running your own server with in memory cache that serves APIs externally.
Heroku alone sounds like a good option (+some cache like CDN like Cloudflare). Obviously with some CD (if Heroku can do it - fine, if not - Gatsby as a business has something?). And we have all the flexibility we need, up to having databases if needed.
This solution should be very simple to deploy, runs locally, has previews, edge caching is done by Cloudlfare ... any real downsides to this? cc @iAdramelk @jorgeorpinel @fabiosantoscode
Sorry for the long answer guys. I think we are overengineering it a little. In the perfect world I prefer not to have our own Express server at all:
public/static folder. We need it now because we include them by filename now, but it is not a good way to do it anyway. With gatsby I plan to switch to including them with webpack loader for images. This way we can store images in corresponding components, have automatic cache busting with unique filenames and will be able to automatically optimize images at the build time. Same with the images linked inside the md files. This way we can just set inifinite cache time for such images and forget about them.heroku-buildpack-static allows to set redirects using json-file config in the root directory without the need for the custom servers.The only problem that we have with static approach is hosting and updating API functions and caching their results. And, for example, Netlify allows us to solve this as well using Netlify Functions. Here is an example of using Netlify Functions to fetch remote API. It's not that different from our current API implementation and can be deployed and updated as a part of our normal deploy process to Netlify.
I'm not sure that going with Netlify is the best option because I'm not sure that we can optimize our build time on Netlify and I'm not sure from the get go how to cache results of such serverless functions between calls, so I'd like to check other options too, like Heroky, now.sh, etc.
But ideally what I would like to have in the result:
public folder without the need to manually set up the server.For the local development we can either mock these functions or use already deployed one's. I doubt that we will be updating them this often. We can even place them in the other repository and deploy them separately.
@iAdramelk
Static hosting that just exposes our public folder without the need to manually set up the server.
What do you mean by manually, what are the benefits you see in not using an express or something else before Gatsby?
We don't need to care about correct headers for the files in the public/static folder.
most likely Cloudflare/Netlify do proper headers already?
Redirects that are set up as a json files in the root without the need for a server.
I doubt the Netlify redirect's config is flexible enough to handle what we need. Probably, heroku's one is the same, but I haven't checked (but of we go all static Heroku does not make much sense anyway)
API as a bunch of serverless functions written in the node.js and deployed as a part of our standard deploy process.
I like, but it feels like it might complicate the workflow, deployment, local experience .. would love to try before we jump into this.
For the local development we can either mock these functions or use already deployed one's. I doubt that we will be updating them this often. We can even place them in the other repository and deploy them separately.
Sounds like a complicated setup to me. Would love to see something like yarn develop locally that can handle everything.
I think we are overengineering it a little
Agree, that was my point too 🙂 In general I also incline for as static as possible, and built in redirects so we don't need our custom module for that. Built-in redirects probably have much better load capacity, for example. p.s. from what I read, Netlify _redirects is flexible enough, Ivan.
files in the public/static... With gatsby I plan to switch to including them with webpack loader for images
Wha about markdown files? I'm still confused about this part but probably when the Gatsby migration is ready and I get to see it I'll be clearer, so no need to ask this Q. Let's just keep mind that we change MD files very often.
The only problem that we have with static approach is hosting and updating API functions and caching their results
- API as a bunch of serverless functions written in the node.js and deployed as a part of our standard deploy process.
We could just have the API as a separate node app. I checked pages/api/** and it seems totally stand-alone anyway. (This way also in the future it's possible to pass the API through an authentication/ rate limiting gateway e.g. KongHQ if ever needed.)
Serverless approach also works but maybe its easier to maintain as a regular app to have the same deploy process, and also to reduce the system complexity? Agree with Ivan here.
p.s. this issue is kind of long, would be great to summarize options. I'd do it but I'm not sure I understand every comment completely.
@shcheklein
What do you mean by manually, what are the benefits you see in not using an express or something else before Gatsby?
My main concern is local development. Using server before static folder in prod is not a problem at all. But if we use Express locally we will need to run it alongside with the gatsby dev server on separate ports and we will need to proxy calls from one port to another. There is also a problem that the port is hard-coded in the resulting html so we will need to somehow update ports in the code that gatsby server generates while gatsby server is still runs on the original port. I didn't research this topic in depth and it is possible that there is an existing plugin for that or that this is easy to configure. But if not, we will need to write a maintain a lot of our own code for that instead of just starting default gatsby dev server with standard command.
most likely Cloudflare/Netlify do proper headers already?
That's my point, we don't need a server for that. We just need to create unique names and static hosting will do the rest. But with our own server on Heroku we will need to do it ourserves if I understand correctly.
I doubt the Netlify redirect's config is flexible enough to handle what we need. Probably, heroku's one is the same, but I haven't checked (but of we go all static Heroku does not make much sense anyway)
Do you have examples of the redirects that you think we would not be able to implement? I had a fast look at the docs and I think that everything that we have in the redirects-list.json can be done with both Netlify and heroku-static.
I like, but it feels like it might complicate the workflow, deployment, local experience .. would love to try before we jump into this.
Well, if we update them often then yes, but I think that we probably just push them once and them forget about them for a year or so. This way we can just use global urls for the local development.
Sounds like a complicated setup to me. Would love to see something like yarn develop locally that can handle everything.
It's a little more complicated that I would like to, yes. But I think that this is a choice between this or the problems with local server above. Not sure what is better to implement between them.
@jorgeorpinel
Wha about markdown files? I'm still confused about this part but probably when the Gatsby migration is ready and I get to see it I'll be clearer, so no need to ask this Q. Let's just keep mind that we change MD files very often.
It's not a problem. We can automatically optimize them and update their paths with gatsby, we already are doing it in the blog.
We could just have the API as a separate node app. I checked pages/api/** and it seems totally stand-alone anyway. (This way also in the future it's possible to pass the API through an authentication/ rate limiting gateway e.g. KongHQ if ever needed.)
My main concern here is running it alongside with the gatsby dev server (see my answer to Ivan above).
@iAdramelk
I'm not sure that we can optimize our build time on Netlify and I'm not sure from the get go how to cache results of such serverless functions between calls
Looks like we can't optimize the build time. I've given it a try here: https://github.com/iterative/blog/pull/115. We bust through the netlify cache limits, even without caching image processing (which is our biggest bottleneck I think). As per caching the results of the serverless functions, if we set a cache-control and expires header, the CDN/cache will take care of it, as well as the browser.
I'd like to check other options too, like Heroky, now.sh, etc
now.sh is a real contender, I feel. You can use the now dev command for local development, which gives you serverless functions under /api. I think it takes care of integrating the serverless functions with the underlying server (gatsby develop in our case) through some reverse proxy of their own. In production, gatsby can be served as a static directory and nothing else needs changing. It also allows for redirects expressed through JSON.
The serverless functions in now.sh, as expected, are cached on their end if you use cache-control header (scroll down to "serverless functions".
I think we shouldn't get too hung up about server-side caching in any of these solutions. Basically all of them respect the cache-control header. The header is not only meant for browsers, but for any kind of proxy as well. That's what the private|public segment is for, it's for the proxies to know whether to serve the same thing to other people.
I've looked into heroku, and they limit your build cache to 500mb.
I did a small test with now.sh for the blog (changed 2 lines in the package.json), here it is:
https://blog-fihp2x2rk.now.sh/
Here's a function with a 60 second cache using a cache-control header:
https://blog-fihp2x2rk.now.sh/api/example-function
After the first build, this took in total 3 minutes to deploy, including the build time.
Locally, now dev serves me the blog (gatsby develop) with hot reload, plus my function under /api/example-function. In production, it's a static website with serverless functions, with a CDN tacked onto it. If we do need to run a node process instead of a static website in the future, that's possible too. However, I don't think we will need that since we have serverless functions.
Integration with github is also possible, providing us with per-PR environments.
I've read through your comments @jorgeorpinel and @iAdramelk, and this seems to tick all the boxes for you.
Unless anyone has any issues with this solution, when I run out of things to do I'll be sending a PR.
I was really disappointed to find out that zeit now no longer supports custom servers. They do support adding routes in JSON, which is working fine for /doc/* (including status codes).
@fabiosantoscode it looks like exactly that we need!
@fabiosantoscode a few more questions - how much will it cost us to build with them if we support previews? would love to explore a more conservative option with Heroku as well - in terms of price, build time (if cache is enabled), and local experience (you mentioned some middleware?)
I still concerned with these fancy options like zeit and Netflify to be honest. I really don't like their aggressive pricing models, I don't like waiting minutes to deploy a preview (to some extent Gatsby's problem not hosting)?
Bottom line - can we do better?
To be precise - we pay for Heroku up to $50 / month since we do a lot of preview deployments. It's up to 30 hours with pro plan. Will it be enough? Most likely, yes.
The thing I still don't like is waiting minutes to deploy. If blog takes 3 minutes, it'll be > 10 minutes to deploy blog+dvc.org. Is there a way around it?
If blog takes 3 minutes, it'll be > 10 minutes to deploy blog+dvc.org. Is there a way around it?
dvc.org took 30s to deploy for me (15s of which is building JS). Since we will be conflating both of them together, they will share a lot of code, a framework and webpack cache, therefore I would be very surprised if building dvc.org added more than 20 seconds to the total build time.
Our major bottleneck is the generated images in the blog. public/static is 409mb large. It takes around 2 minutes to download them from the cache. Doesn't really matter which platform we're on.
If we can store these images on S3 using DVC, and if with DVC we can somehow generate thumbnails only for images which changed, without downloading everything (can it?), we might be able to host them from S3 directly (using the DVC remote cache URLs). This might be accomplished through a source plugin which stores the checksums of the images to see which ones changed, or with dvc run.
If we can do this, then heroku can be very speedy (we just need to cache node_modules, .gatsby and public, which becomes small enough to be cached). However I think heroku is a bit overkill for us, and it doesn't include a CDN to cache things at the edge like netlify and zeit now.
how much will it cost us to build with them if we support previews
I had a look at their pricing page, and overall I think we can go with the $20 plan. It gives us unlimited deploys, and 10 hours of build time every month. This gives us around 120 builds every month (if they took on average 5 minutes). If we go over the 10 build hours, we pay $10 more, instead of being forced into the $200 plan. We're limited to 3 team members, which I suppose is users with admin access, not users deploying. Couldn't find any specifics on this, so I'll go and ask directly.
@iAdramelk keeping the conversation on this ticket
Where do you store example-function btw? I don't see it in changed files
It's not in this PR, since the function is in the blog. I didn't share it either. Here it is:
// api/example-function.js
export default (req, res) => {
res
.setHeader('cache-control', 'max-age=60, public')
.json({
timeAtWhichThisWasRun: new Date()
.toISOString()
.split('T')[1]
.split('.')[0],
version2: true
})
};
(When we have gatsby, our functions will be in api, not pages/api. zeit now piggy-backs on the existing concept of nextjs functions so they use the same folder)
edit: the version2 property was to tell deploys apart and make sure they were deploying the new function correctly.
@fabiosantoscode cool, thanks!
Would just like to add that we can set a different cache-control for the server and the client by using the s-max-age instruction. This gets interpreted by the CDN and stripped away so it doesn't confuse the browser.
Additionally we have access to stale-while-revalidate, where we can respond to the client immediately while we're getting new comments, issue counts etc.
@fabiosantoscode We discussed a lot of stuff there, but like @jorgeorpinel correctly pointed it became hard to find information between all comments.
If it's not hard for you, can you please gather all the stuff that we discussed to to something like table so can seed and compare them one to another? I'd do it myself, but right now you have more information that any of us here on this topic.
I see it to have set of fields like that:
Maybe I missed some other important options. CC @shcheklein
The hosting that we discussed so far are: Heroku, Heroku with static buildpack, Netlify and Now.sh. But we are open to other options and mixed approaches too. For example to speed up builds we can use something like https://www.gatsbyjs.com/cloud/
dvc.org took 30s to deploy for me (15s of which is building JS).
It's not Gatsby yet. It'll be soon. That's my concern.
Our major bottleneck is the generated images in the blog. public/static is 409mb large. It takes around 2 minutes to download them from the cache. Doesn't really matter which platform we're on.
Are there other ways to deal with images? (besides DVC for now) cc @iAdramelk ? Feels like the big limitation of the whole approach.
caching and CDN
did we try to access from different regions (different countries)? What are actual guarantees with using headers - is there a link we can rely on?
did we try to access from different regions (different countries)? What are actual guarantees with using headers - is there a link we can rely on?
Didn't think of that!
A quick trip around the world with my VPN shows that different regions result in different requests using now.sh. I'll reflect this in my write-up.
I can't be super specific as I haven't created a POC for heroku, but here goes:
| _ | Heroku w/CDN, custom build cache | Now.sh | Netlify |
|-----|-------------------------------------------------|------------|----------------- |
| Build time | around 5 minutes incl. downloading and uploading S3 cache | 3-4 min | 7-8 min |
| Can we improve build time | no | no | yes (custom build cache) |
| Redirects | Yes (issued by custom express server, and cached by CDN as long as we want) | Yes for pathname redirects, needs dirty hack for cross-domain redirects (error.dvc.org, man.dvc.org, etc.) | Yes, including by subdomain |
| PR environments | Yes (automatic for heroku branches) | Yes (automatic) | Yes (automatic) |
| API endpoints | Yes | Yes (serverless) | Yes (express or serverless) |
| Serverside cache | Yes, at CDN, in-memory or external memory cache | Yes, at CDN or external memory cache | No docs that I can find but I assume yes; external memory cache and in-memory cache possible |
| Pre-empt cache | Yes (periodically push "api endpoints" to CDN) | No | Only if this API call does what I think it does |
| Price with our usage | As said before, ~$50 per month plus CDN costs (< 10 USD for cloudfront + S3, rough calculation) and S3 costs (~8 USD for 1gb cache plus traffic between heroku build and s3) | $40-$60 depending on build count | $45 for pro plan incl 1000 minutes build time plus $7 per extra 500 |
| Local dev | custom express server with gatsby-plugin-express middleware in the end of the chain. | now dev | custom express server with gatsby-plugin-express middleware in the end of the chain. |
Regarding what @shcheklein just mentioned about cross-region caching, we have 3 options:
I haven't included full-custom solutions like putting everything on S3 and having lambdas, or using cloudflare workers through and through. The main reasons I didn't add them is because they don't fulfill local dev and PR environment requirements without adding own custom code for CircleCI.
Thanks for the summary Alex and Fabio! Not sure about costs, but from the ability to have custom redirects without custom server I think Netlify is edging out the others.
Local dev/ Heroku
They have heroku local too, https://devcenter.heroku.com/articles/heroku-local
- Can we implement our API endpoints here? Yes/No.
I'm still not sure I agree we want serverless functions vs. separating the web app and the api app, which would simplify the system complexity, decision process, and maintainability. (The question about running them both locally doesn't seem like a hude deal to me, you can use env variables, and/or just run one locally at a time e.g run dev web app locally and let it connect to prod api).
Also if we're not finding anything perfect and we don't want to compromise on any feature we can go full AWS (having options as simple as ElasticBeanstalk or as complex as CloudFormation). But again, this increases our DevOps/SRE complexity and risks.
@fabiosantoscode
please, put some corrections:
if we can get local Heroku dev to a single command and utilize build cache (and don't hit some limits soon if it's already 400-500Mb) then I would vote for Heroku.
And it feels to me that no matter option we pick we should do something about ~500Mb of images. We have only a dozen of blogs posts now and it's already a problem.
@iAdramelk any comments regarding this table from your end?
@fabiosantoscode any idea what takes most of the cache (400-500) - images or JS modules?
Heroku local dev - would love to see how would it look like.
It seems it just runs npm start in our case (since there's no Procfile in the repo). (Wraps node-foreman.) BTW it doesn't work locally (same as yarn start): it tries to 301 redirect http->https for localhost...
@jorgeorpinel
by Heroku local dev - would love to see how would it look like. I mean Gatsby static + express or something to handle redirects + middleware to make it work together as @fabiosantoscode mentioned. It's not that much about Next.js app we have deployed on Heroku right now. There are no any problems with running it (including APIs, redirects, etc, etc).
Heroku local dev - would love to see how would it look like.
Excluding redirects (which can be done just like we do in dvc.org), this is how a gatsby project using an API with middleware can look like.
https://github.com/iterative/blog/pull/122
I was wrong about what gatsby-plugin-express can do. It's only a static server. However, gatsby does provide a way to tack our middleware on top, making the local dev and production story very solid with just an API middleware (we can add a redirect middleware just as easily).
if we can get local Heroku dev to a single command and utilize build cache (and don't hit some limits soon if it's already 400-500Mb) then I would vote for Heroku.
Heroku dev with a single command, see above. Cache limit is going to be hit inevitably.
any idea what takes most of the cache (400-500) - images or JS modules?
In the blog it's mostly modules, but images are a huge chunk. If we add dvc.org to it the modules won't grow by much, I think. And we don't have too many images.
(venv) fabio@fabio-thinkpad ♥ du -sh node_modules/
565M node_modules/
(venv) fabio@fabio-thinkpad ♥ du -sh public/
207M public/
(venv) fabio@fabio-thinkpad ♥ du -sh public/static/
165M public/static/
Largest modules are typescript, babel, core-js and rxjs.
I looked through gatsby-plugin-sharp a lot. I'm really looking for a way for us to store the images elsewhere during the build (like S3), to get around the cache issue while not always regenerating images. During production we could proxy image requests from the app to S3 (or if we have a CDN we can do the proxying from there).
I really think this is the way to go. gatsby-plugin-sharp clearly has a way to avoid re-compressing images from the filesystem if they haven't changed. If the filesystem is just another source, why not S3?
@shcheklein I've made the edits. Except for this one:
Heroku ~$50 only, no CDN or any other hidden costs
The extra costs I added are accounting to our own expansion of the build cache, and the fact that heroku doesn't feature HTTP caching at all, much less a CDN.
Heroku doesn’t provide HTTP caching by default. In order to take advantage of HTTP caching, you’ll need to configure your application to set the appropriate HTTP cache control headers and use a content delivery network (CDN) or other external caching service.
We will probably be using S3 even more, since the maximum deployable in heroku is 500mb and it probably includes static files.
@fabiosantoscode
The extra costs I added are accounting to our own expansion of the build cache, and the fact that heroku doesn't feature HTTP caching at all, much less a CDN.
CloudFlare handles this for free, right? And we already run everything through it.
And we can utilize in-memory + CDN cache easily with Heroku for API cache. Again, for free and no changes are required.
We will probably be using S3 even more, since the maximum deployable in heroku is 500mb.
so, it's not related to images. How do people deploy JS apps to it anyways, then?
CloudFlare handles this for free, right? And we already run everything through it.
I forgot about cloudflare :) Cloudfront / S3 also have a free tier, it just depends on how much of it we can use.
But hey, if we're using cloudflare, we could place the API in cloudflare workers as you mentioned before, and make use of cloudflare KV as an efficient memory store for storing github etags and responses. And deploy the rest of the site using gatsby's own thing, which will give us the fast builds we want.
so, it's not related to images. How do people deploy JS apps to it anyways, then?
It's pretty related to images. Your typical JS app is smaller than 500mb especially when installing only production dependencies.
Here's the source: https://devcenter.heroku.com/articles/slug-compiler#slug-size
I also can recommend to take look at aws with Amplify based on S3+Cloudfront+other aws services
we could place the API in cloudflare workers as you mentioned before, and make use of cloudflare KV as an efficient memory store for storing github etags and responses. And deploy the rest of the site using gatsby's own thing, which will give us the fast builds we want.
it's not cleat if it's easy to run them locally in this case. I would avoid this fancy stuff because of this. Unless there is a simple solution.
@JIoJIaJIu I've used amplify before, it's great. I saw, however, that their automated PR environment thing only works on private github repositories, to avoid unsolicited PRs increasing costs.
We can always roll our own PR environments.
They do seem to have local dev facilities though, and the flexibility is through the roof since we're free to use any AWS service with it without going through the open internet.
Sorry for the long silence guys. Make some testing by myself. Some results:
Current image count:
To trigger rebuild I edited title field in the gatsby-config.js file everything else was the same. Clean build and rebuild results were following:
| | Total | GQL queries | Images |
|:--|:--|:--|:--|
| No cache | 2m 43s | 26s | 2m 19s |
| Cache | 22s | 11s | 0s |
gatsby-image| | Total | GQL queries | Images |
|:--|:--|:--|:--|
| No cache | 3m 55s | 35s | 3m 46s |
| Cache | 24s | 13s | 0s |
Definitely still broken, so immediately reverted.
| | Total | GQL queries | Images |
|:--|:--|:--|:--|
| No cache | 1m 42s | 27s | 1m 22s |
| Cache | 22s | 11s | 0s |
| | Total | GQL queries | Images |
|:--|:--|:--|:--|
| No cache | 12m 45s | 1m 29s | 10m 15s |
| Cache | 57s | 36s | 0s |
| | Total | GQL queries | Images |
|:--|:--|:--|:--|
| No cache | 3m 42s | 38s | 3m 14s |
| Cache | 30s | 21s | 0s |
| | Total | GQL queries | Images |
|:--|:--|:--|:--|
| No cache | 8m 8s | 54s | 5m 54s |
| Cache | 7m 4s | 46s | 5m 39s |
Longest part of the build process is by far thumbnail generation on the first build. We are now generating more than 1K images.
Disabling webp reduces build time by approximately 1/3, but make end user's experience worse.
Rebuild time with existing cache (node_modules, .cache and public folders) are quite fast.
If hosting provider caches them like now.sh or gatsby cloud, then rebuild time can be less than 1m.
Netlify is by far slowest option between the ones that I tried and definitely didn't cache public folder. And even if we enable its caching overall build times still would be the longest. So I'd say we can safely remove it from the candidates list.
P. S. One more thing. After I updated gatsby's dependencies in package.json, it invalidated cache and made a full build again. So for commits like that, I think long rebuild time is inevitable.
Thanks @iAdramelk!
I think this shows how much a good cache can influence build times.
Our best options are clearly now.sh and gatsby cloud. However, gatsby cloud doesn't come with API endpoints and is a bit pricey.
Using cloudflare for local development is not very optimal, so I propose we get rid of it for local development, and in production use a worker which takes every request to /api/*, and uses etags and if-none-changed requests so our functions do not use any github API limits (remember, if-none-changed, then github-none-rate-limit). This should be rather simple to implement, and maybe some other CDN already does this for us out of the box. I'll look into this.
In PR environments, our APIs are the production APIs. Locally, our browsers are going to resend the etags headers to the API, not increasing our limits. If this doesn't work we can always wrap our functions in a caching wrapper.
Fastly is capable of doing if-none-match requests to our servers if we respond with etag headers. Therefore, if we are using this, we can do an if-none-match request to GitHub using the same etag, and respond with a 304 (or 200 if anything changes on GitHub's side). Then fastly will remember the old response and serve it.
I'm going to check whether fastly uses a global cache or if a request from China can't use a cached response from Europe.
So my summary would be this:
So the best way to optimize build time is not to create on each build. There are 2 possible solutions to that:
With second approach we can do following:
gatsby-image with our own image component that will generate all the needed src paths for srcset in <picture>.imgproxy in separate container on Heroku.imgproxy, it will resize images and cache them on CDN. On the second request and after we will use CDN versions instead regenerating it.This way we can have fast and consistent build time for local development too. But our infrastructure will be more complex.
With your second approach there we can also have the API, and we get the flexibility of heroku. I like that!
There's more options to resize images, including using cloudflare and fastly. This can make our infra a tad simpler.
For caching the API, there should be a bunch of ways to do it, like a varnish container or just in-process memory. Our options are pretty limitless here.
Also: fastly doesn't replicate cached results globally. So no go there.
Ok, so it's Now vs Heroku custom server.
With Now we need to clarify the following:
With Heroku:
any other thoughts? I'm missing anything else in this summary so far?
the best way to optimize build time is not to create on each build...
Don't create images at build time at all and instead use image resizing server like https://imgproxy.net/ or something else like that.
I like this approach of extracting the problem which is very specific and well defined to a specialized service.
Heroku custom server...
we potentially hit the cache size limit, but on the other hand it feels like it might be a good idea to invest time into proper images infra
Yep, since there is still some unfamiliarity with other platforms and possibly not super strong reasons to move out of Heroku I would incline to stay there (also seems like it has more predictable pricing).
But I haven't been as involved as the others in this research so I don't think my vote should be weighted equal. This way also there's only 3 (real) votes in this issue, so no possibility for ties 😬
@shcheklein
You're right about now.sh redirects and cross-region API response caching.
As per the trailing slash, I can confirm. You can curl this yourself and see the trailing slash disappear.
I think heroku and an image resizer might be our best option here. In terms of flexibility, we're able to deploy pretty much anything we want and cache things properly.
If the heroku slug size limit hits us, we can choose to not deploy the images there and use S3 or simply raw.githubusercontent.com/branch/path/to/image as a target for our image resize service.
It doesn't use the right content-type but otherwise it works and will work for branch preview as well.
@iAdramelk Heroku it is?
@shcheklein @fabiosantoscode looks like Heroku. I don't really like idea of our own server running, but looks like we don't have a choice for API and redirects.
@fabiosantoscode let's proceed with Heroku, we can start by moving blog as an example.
I got my fork deployed on heroku: https://dvc-blog-production.herokuapp.com/
Trailing slashes get removed, no redirects yet (but will be simple to do, will get to it later).
There's no timer on heroku builds but using a stopwatch I got 3m05s, 20s of which is the time Heroku takes to pick up the build (might be larger today because they seem to have an ongoing incident), and there's also 60s of "pruning devDependencies", which is basically ensuring no devDependencies end up in production. I find it weird that it takes so much time to do.
The public folder is cached, as well as the gatsby folder. And for some reason the slug size is small even though there are so many images.
We have 2 choices here:
I'm going for 2, unless anyone has any objections. I won't start right now (it's late here) but tomorrow if nobody's said anything I'll configure heroku to run the tests.
and there's also 60s of "pruning devDependencies"
can we get rid of this>
Gatsby built time is ~7s which is great.
2 sounds good.
can we get rid of this
The reason for this is explained in this issue. TL;DR yarn rebuilds the prod dependencies from source when removing devDependencies.
I tried switching to npm after seeing the issue, about pruning dev dependencies in heroku when there's statically built dependencies. However it only shaved off around 10 seconds.
So I tried upgrading to yarn 2, which removes the install step and node_modules. I was super excited about it until I found out gatsby doesn't support this yet.
All we can do is try to trim some of the devDependencies (kill typescript, anyone? The compilation step and plugins can be replaced with JSDoc comments, plus a typechecker like tern.js or typescript itself)
So I'm going to move on.
Apparently they were already working for PRs and master, so I moved on.
They work, but the first build of each PR takes a very long time. This is because it refuses to use cache. However when running tests, the cache is used properly.
This means that opening a new PR means you need to wait for more than 10 minutes to see your preview. The cache issue also happens in the dvc.org repo, however since it doesn't compress a ton of images that's not a problem.
I think the obvious solution here is to compress images on demand, as mentioned above, which has the nice side effect of speeding up the build process further, since there will be no need to cache those images. But I feel like it's not part of this initial PR. So I will clean things up and issue a PR.
However when running tests, the cache is used properly.
what do you by running tests precisely?
Am I correct that after I created a PR, next changes (commits) will be quick?
The tests are in a separate pipeline from deploys, where the cache is respected.
Yes, more commits on top of the same PR are built quickly (3min).
PR here: https://github.com/iterative/blog/pull/135 :eyes:
Closing this as we moved to Heroku.
Most helpful comment
I got my fork deployed on heroku: https://dvc-blog-production.herokuapp.com/
Trailing slashes get removed, no redirects yet (but will be simple to do, will get to it later).
Build time
There's no timer on heroku builds but using a stopwatch I got 3m05s, 20s of which is the time Heroku takes to pick up the build (might be larger today because they seem to have an ongoing incident), and there's also 60s of "pruning devDependencies", which is basically ensuring no devDependencies end up in production. I find it weird that it takes so much time to do.
The public folder is cached, as well as the gatsby folder. And for some reason the slug size is small even though there are so many images.
Tests & Types & Lints
We have 2 choices here:
I'm going for 2, unless anyone has any objections. I won't start right now (it's late here) but tomorrow if nobody's said anything I'll configure heroku to run the tests.