Fastapi: Production server for best performance

Created on 10 Dec 2020 · 9Comments · Source: tiangolo/fastapi

Which gives the best performance in production?
For deploying in production we use docker.
And what better use gunicorn -w or uvicorn --workers?

In uvicorn docs recommends use gunicorn (https://www.uvicorn.org/#running-with-gunicorn):

This allows you to increase or decrease the number of worker processes on the fly, restart worker processes gracefully, or perform server upgrades without downtime.

But i think it not actual when using docker.

question

Source

datasatanic

Most helpful comment

For production deployments we recommend using gunicorn with the uvicorn worker class.

includeamin on 11 Dec 2020

👍2

All 9 comments

tiangolo/uvicorn-gunicorn-fastapi-docker

includeamin on 10 Dec 2020

We use custom Dockerfile. And balance worker manualy (on uvicorn ). Wondering if the upgrade to Gunicorn is worth it?

datasatanic on 11 Dec 2020

it is recommended to use Gunicorn in production ( with Uvicorn workers).
I've used Gunicorn in all of my productions environment.

you can create a Custom Dockerfile . just check the tiangolo/uvicorn-gunicorn-fastapi-docker repo to find out how to configure the gunicorn depending on your requirement.

includeamin on 11 Dec 2020

It's recommended by uvicorn itself: https://www.uvicorn.org/deployment/#using-a-process-manager

But talking about numbers (performance), I didn't find any comparison between those...

Kludex on 11 Dec 2020

👍2

For production deployments we recommend using gunicorn with the uvicorn worker class.

includeamin on 11 Dec 2020

👍2

I will add something that is a bit different from the above comments, to give you something else to think about. Gunicorn appears to have lots of good stuff and is probably sufficient for a lot of people, but for a project that I have worked on Gunicorn did not appear to be suitable/necessary.

After some trial and erroring along with lots of researching, I discovered the reasons for this had to do with our architecture. We are running our application on Kubernetes, which had a few implications on this decision:

Kubernetes can scale our app, and should be in charge of that imo. It is also in charge of the number of resources that a pod will have access to. I think it would be confusing to have Gunicorn also attempting to do scaling inside of a pod instead of adding a replica. I don't know how much overhead and what not Gunicorn has over Uvicorn (I'm sure it's nothing much), but there was no reason to find that out.
We let our pods die if there is a server error, and Kubernetes will restart it and we will get our alerts, be able to see a pod restarted N times etc. I struggled to kill a pod with Gunicorn, I think because of the way it is modeled. In Gunicorn, a main process spawns worker processes. But if a server error had occured - it would not crash the pod. I can't remember all of the details why, but I recall trying to figure it out for some time (trying different signals and stuff too) with no luck. I think possibly it was that the worker process would die but not the main one because of the way it runs.

If by using Docker, you mean using Docker Swarm, then you may be in a similar situation to what I was in, and really you could do things either way. If you scale with Swarm, you will be consistent in your scaling with Swarm at least, and if you do something like we do by killing/restarting containers, you will achieve that with Uvicorn easily. But this is where you need to decide what is best for you.

Disclaimer: I have never run an app in Gunicorn so I have no production experience with it over a significant period of time, only with trying to use it. _Maybe_ the 2nd point is solvable, but the first point still stands, for me anyway.

Kennpow on 14 Dec 2020

Thank you, @Kennpow this is useful information. I think we will eventually come to using Kuber.

datasatanic on 14 Dec 2020

@Kennpow have you tried Knative? If not definitely check it out. You can do a lot more than Gunicorn able to do.

Some pros:

Per request load balancing.
Traffic splitting using revisions of a service.
Request based auto scaling.
It's easier to make new deployments, you don't need to write 100s LOC yaml.
[optional] automatic monitoring support for HTTP metrics (latency, requests, etc.)
[optional] automatic TLS certs and termination for external endpoints