Fastapi: FastAPI and Uvicorn is running synchronously and very slow

Created on 5 May 2020  路  20Comments  路  Source: tiangolo/fastapi

I'm new in FastAPI and i'm testing file uploads and asyncronous requests. However, when i perform several request with clients parallel and serial the FastAPI process each upload in Queue (synchronously) and very slow. I'm performing the API with uvicorn and gunicorn and with 1 worker. With both execution the time spent was the same.

My client send 4 files with approximately 20MB in parallel (or in serial) for FastAPI endpoint, however, it is storing the files one at a time and very slow.

I made the same upload with a aiohttp endpoint and the files was stored in approximately 0.6 seconds with client making request in parallel (multiprocessing) and 0.8 seconds with client making request in serial (in mean). When i made this uploads in FastAPI the files was stored in approximately 13 seconds with client making parallel request and 15 seconds with client making serial request (in mean)

Would I like know if i'm making anything wrong?

Server Code


# app.py

from fastapi import FastAPI, File, UploadFile
import random
import aiofiles
import os

app = FastAPI()

STORAGE_PATH = 'storage'

@app.post("/")
async def read_root(file: UploadFile=File('teste')):
    fpath = os.path.join(
        STORAGE_PATH, f'{random.randint(0, 5000)}_{file.filename}'
    )
    async with aiofiles.open(fpath, 'wb') as f:
        content = await file.read()
        await f.write(content)

    return {"Test": "Test"}


# uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1
# gunicorn -w=1 -k=uvicorn.workers.UvicornWorker --bind=0.0.0.0:8000 app:app

Client Code

FILES = ['f1.txt', 'f2.txt', 'f3.txt', 'f4.txt' ]

def request(fname):
    files = {'file': open(fname,'rb')} 
    requests.post("http://localhost:8000/", files=files)


def req_mp():
    start = datetime.now() 
    pool = Pool(4) 
    pool.map(request, FILES) 
    print(datetime.now() - start)


def req_serial():
    start = datetime.now()  
    for fn in FILES:    
        request(fn)
    print(datetime.now() - start)
question

Most helpful comment

Had similar performance issues when doing file upload using fastapi.
Looks to me as the Multipart parsing causes the performance impact.

from fastapi import FastAPI
from starlette.requests import Request
from starlette.datastructures import UploadFile as StartletteUploadFile, FormData
from starlette.formparsers import MultiPartParser

app = FastAPI()

target = "./test"
# will cause tempfile.SpooledTemporaryFile (StarletteUploadFile) to be written to the same place as target
os.environ["TMPDIR"] = os.getcwd()

# avg speed 275 MB/s
@app.post("/uploadfile")
async def uploadfile(request: Request):
    file = StartletteUploadFile(filename=target)
    async for chunk in request.stream():
        await file.write(chunk)
    await file.close()

# avg speed 350 MB/s
@app.post("/plain")
async def plain(request: Request):
    with open(target, "wb") as file:
        async for chunk in request.stream():
            file.write(chunk)
    file.close()

# avg speed 68 MB/s
@app.post("/multipart")
async def multipart(request: Request):
    m = MultipartParser(request.headers, request.stream())
    await m.parse()

StarletteUploadFile is still a bit slower than writing to a plain file, but that differences might be caused by other things running on my computer.
I know that my full body is written to the file and not the file I'm uploading. It was just to check if multipart processing causes the performance impact.
I also tested to just write the stream to the parser without writing it to a StarletteUploadfile afterwards, which shows the same.

TLDR;
I think file IO is not the main issue here.

All 20 comments

aiofiles itself is very slow, I would try re-running this using open and write directly, which will block but will probably be faster

file.read() is also going to read the entire file then put i into memory, which I believe under the covers may be a blocking operation despite saying async

I would try the following which may not work, alternatively you can use async read(bytes) to choose the rate at which you read and then write it.

    async with aiofiles.open(fpath, 'wb') as f:
        for line in file:
            await f.write(content)

If you really want to have non blocking uploads you'll need to send a binary stream, and read it directly from the body, that's how I'm handling multi gb uploads in my own system.

Thanks for answer @chris-allnutt. I don't know exactly how do make "async read(bytes)" . i made another tests, with others algorithm and the result also was slow. I changed the algorithm for write file asynchronously (chunk by chunk) and the result was the same.

New test 1

with open(fpath, 'wb') as f:
    content = await file.read(4096)
    f.write(content)

# uvicorn - time spent: 8 seconds - client serial (req_serial)
# uvicorn - time spent: 6 seconds - client parallel (req_mp)
# gunicorn - time spent: 6 seconds - client - parallel (req_mp)
# gunicorn - time spent: 6 seconds - client serial (req_serial)

New test 2

async with aiofiles.open(fpath, 'wb') as f:
    while True:
    chunk = await file.read(4096)
    if not bool(chunk):
        break
        await f.write(chunk)
 # uvicorn - time spent: 15 seconds - client serial (req_serial)
 # uvicorn - time spent: 12 seconds - client parallel (req_mp)
 # gunicorn - time spent: 13 seconds - client serial (req_serial)
 # gunicorn - time spent: 12 seconds - client parallel (req_mp)

this same algorithm was used in my tests with aiohttp and the read and write was asyncronously in fact with time between 0.8 and 0.4 seconds (aproximately) for save files. I made these tests with django with all operations sincronouly and the all files are written between 1.2 and 1.8 seconds.

Another thing i saw, when i send the requests, exists a delay until request are processed by python code.

I saw many benchmarks and fastAPI and Uvicorn has great results. But, I can't verify the performance optimization that the tool offers in the tests I'm doing.

I'd like to use FastAPI in my next projects, but, unfortunately this basic tests aren't show very nice results.

Well here nobody wants to sell you anything, but could you post your OS and Python version?

The test was performed in Ubuntu 18.04 (Desktop), CPU I5 4 cores, 8GB RAM and with Python 3.8.0.

IIRC, all aiofiles does is run the blocking file operations in a threadpool to avoid blocking, so there wouldn't be much difference between using aiofiles to manipulate a file from within an async def read_root() route and using the standard python file I/O functions from within a def read_root() (which FastAPI would automatically run inside a threadpool). From what I'm reading, aiofile (no 's' at the end) uses POSIX asynchronous file I/O calls directly, so it might be worth trying that.

I remember reading in some places that POSIX asyncio operations have their share of issues, and it looks like this might be the reason why asyncio doesn't support them directly (this section looks to have been written in 2015 or before, so it might not reflect the latest state of things). I'm not seeing any mention of IOCP for async file I/O on Windows or io_uring for Linux, so that might be ripe for improvements in the near future. Right now, though, aiofile seems to be your best option.

TL;DR: Async file I/O is hard, and it's one use case where support tends to be disappointing in a lot of places, unfortunately.

Thanks for answer @sm-Fifteen. I made the tests with aiofile and the result was the same. I believe tha the problem isn't in the algorithm for read or write file because this algorithms are the same that i used in tests on aiohttp and Django.

I used the aiofiles, aiofile and synchronous open in aiohttp and also i tested the Django reading synchronous and the results haven't arrived in 2 seconds neither in the syncronous cases. (cited above)

Can be i'm making some wrong, i dont know, but, the unique difference between tests are the framework. I don't know whether this can be the sent file serialization or some fastapi specificity

I have test this algorithms with sanic API and i getted greats results with sanic performend in standalone (between 0.6 and 0.8 seconds) but when i runned Sanic under uvicorn or gunicorn the results got worse and the time spent have grown. However the time spent was very smaller than fastapi running with uvicorn or gunicorn. (Cited above)

app = Sanic(__name__)

@app.route("/", methods=['POST'])
async def home(request):
    fpath = f'storage/{random.randint(0,5000)}.txt'
    async with aiofiles.open(fpath, 'wb') as f:
        await f.write(request.files.get('file').body)
    return text('teste')

# standalone - time spent: 0.7 seconds (mean) - client parallel
# standalone - time spent: 0.85 seconds (mean) - client serial
# uvicorn - time spent: 2.5 seconds (mean) - client parallel
# uvicorn - time spent: 4.5 seconds (mean) - client serial
# gunicorn - time spent: 2 seconds (mean) - client serial
# gunicorn - time spent: 1.8 seconds (mean) - client parallel

As cited above, i know that aiofiles isn't asynchronous in fact, however when i executed sanic under uvicorn and gunicorn the request processing time have grown a lot.

I'm using this frameworks in correct form? Or really uvicorn and ASGI are more slower than framework running in standalone mode?

@igor-rodrigues1 sanic always reads files entirely into memory before passing control to your handler whereas fastapi via starlette uses a tempfile.SpooledTemporaryFile and thus rolls data over onto disk once it grows beyond 1 MB. if you overwrite the default spooling threshold for UploadFile like

from starlette.datastructures import UploadFile as StarletteUploadFile

# keep the SpooledTemporaryFile in-memory
StarletteUploadFile.spool_max_size = 0

... I think you may see performance somewhat similar to that of sanic & uvicorn as it should eliminate the file I/O trip

Thanks for answer @obataku. Using in these way a error occurs. I believe that is because the "UploadFile" class from "starlette.datastructures" don't have validator method used by FastAPI.

from starlette.datastructures import UploadFile as StarletteUploadFile
StarletteUploadFile.spool_max_size = 0

@app.post("/")
async def read_root(file: StarletteUploadFile=File('teste')):
    fpath = os.path.join(
        STORAGE_PATH, f'{random.randint(0, 5000)}_{file.filename}'
    )
    with open(fpath, 'wb') as f:
        content = await file.read()
        f.write(content)

# error occurred
# fastapi.exceptions.FastAPIError: Invalid args for response field! Hint: check that <class 'starlette.datastructures.UploadFile'> is a valid pydantic field type

And using in this way the time spent was similar to the previous implementations with the processing time above of 7 seconds.

from fastapi import UploadFile
UploadFile.spool_max_size = 0

@app.post("/")
async def read_root(file: UploadFile=File('teste')):
    fpath = os.path.join(
        STORAGE_PATH, f'{random.randint(0, 5000)}_{file.filename}'
    )
    with open(fpath, 'wb') as f:
        content = await file.read()
        f.write(content)

The biggest problem isn't file I/O. My question is know why fastAPI and Uvicorn are with processing time very high whether compared with another python frameworks.

The above code is correct?

@igor-rodrigues1 no, I was suggesting something like

from fastapi import UploadFile
import starlette

starlette.datastructures.UploadFile.spool_max_size = 0


@app.post("/")
async def read_root(file: UploadFile = File('teste')):
    fpath = os.path.join(
        STORAGE_PATH, f'{random.randint(0, 5000)}_{file.filename}'
    )
    with open(fpath, 'wb') as f:
        content = await file.read()
        f.write(content)

hopefully that's clearer now

the first snippet is incorrect for the reason you noted (starlette's UploadFile can't be used as a validation typehint) whereas the second works exactly the same as before--fastapi's UploadFile is used solely for typehints while starlette's UploadFile is still the one actually used in starlette.formparsers.MultiPartParser.parse:

                    if b"filename" in options:
                        filename = _user_safe_decode(options[b"filename"], charset)
                        file = UploadFile(
                            filename=filename,
                            content_type=content_type.decode("latin-1"),
                        )

... so we need to adjust spool_max_size in starlette.datastructures.UploadFile, not fastapi.datastructures.UploadFile

Thanks for help @obataku. I have test exectly as your above example and in fact the file write was speed as another python frameworks (0.05 seconds for each file in mean), However the time spent for client that sent request continued between 7 and 9 seconds for both servers thats was running the fastapi, Uvicorn and Gunicorn. Based in this tests I believe waht is making the application slow isn't FastAPI but yes the Uvicorn and Gunicorn.

What I don't understand is why frameworks that was optimized for work with Python code asynchronously are so slow for executing this simple tests with file upload.

@igor-rodrigues1 no problem--and what about running with uvicorn as a standalone ASGI server? have you tried eliminating gunicorn?

@obataku all the tests that i made was using gunicorn and uvicorn standalone. I used the following commands:

```shell
gunicorn -w=1 -k=uvicorn.workers.UvicornWorker --bind=0.0.0.0:8000 app:app
uvicorn app:app --reload --workers 1
````

And all tests that i'm making including the tests with another python frameworks i'm using only one worker.

@igor-rodrigues1 sorry for not replying sooner. I was thinking the flow control may be the bottleneck here--I would try to adjust uvicorn.protocols.http.h11_impl.HIGH_WATER_LIMIT:
https://github.com/encode/uvicorn/blob/master/uvicorn/protocols/http/h11_impl.py#L28

Thanks for answer @obataku . In this case what I have that do? I have
to download the FastAPI source code and increase value of HIGH_WATER_LIMIT?

@igor-rodrigues1 shouldn't be necessary to modify any fastapi code; if you are invoking uvicorn programmatically, like

uvicorn.run(app, host="0.0.0.0", port=8000)

... then you can just do something like

import uvicorn; uvicorn.protocols.http.h11_impl.HIGH_WATER_LIMIT = 2**20

to raise it to 1 MiB, for example, although it's not obvious to me yet whether this is even related here

I'm also curious how uvicorn's performance with httptools works out for you (in which case httptools_impl.HIGH_WATER_LIMIT can also be adjusted). I'd suggest testing with a few different values and both implementations to see

Had similar performance issues when doing file upload using fastapi.
Looks to me as the Multipart parsing causes the performance impact.

from fastapi import FastAPI
from starlette.requests import Request
from starlette.datastructures import UploadFile as StartletteUploadFile, FormData
from starlette.formparsers import MultiPartParser

app = FastAPI()

target = "./test"
# will cause tempfile.SpooledTemporaryFile (StarletteUploadFile) to be written to the same place as target
os.environ["TMPDIR"] = os.getcwd()

# avg speed 275 MB/s
@app.post("/uploadfile")
async def uploadfile(request: Request):
    file = StartletteUploadFile(filename=target)
    async for chunk in request.stream():
        await file.write(chunk)
    await file.close()

# avg speed 350 MB/s
@app.post("/plain")
async def plain(request: Request):
    with open(target, "wb") as file:
        async for chunk in request.stream():
            file.write(chunk)
    file.close()

# avg speed 68 MB/s
@app.post("/multipart")
async def multipart(request: Request):
    m = MultipartParser(request.headers, request.stream())
    await m.parse()

StarletteUploadFile is still a bit slower than writing to a plain file, but that differences might be caused by other things running on my computer.
I know that my full body is written to the file and not the file I'm uploading. It was just to check if multipart processing causes the performance impact.
I also tested to just write the stream to the parser without writing it to a StarletteUploadfile afterwards, which shows the same.

TLDR;
I think file IO is not the main issue here.

Found streaming_form_data which does multipart/form-data parsing much faster, as it uses Cython.
In his blog the author wrote that byte-wise parsing in pure Python is slow, so thats why multipart is slow in the current implementation of starlette.

Created a gist with an example.

Hey can anyone tell me fastAPI speed vs Django Rest Framework which is better in speed

@dspashish this question is many variable depending on context, but, i did make some benchmark between (Django, Flask, Aiohttp and FastAPI) and i got some important results. In this benchmark that i did, i did make the upload with some (in parallel) files from 30MB to 100MB, i did the processing and store the data in the database. The best results was AioHTTP and Flask, both in I/O of file and processing and in I/O of database. In special, the best result was AioHTTP because of your asyncronous I/O.

In this benchmark the FastAPI did not have good result because of this slowness in file upload, but, in the theory FastAPI is to be more fast than django because of your simplified structure, (not have a stack of middleaware as Django), is based in starlette that is a microframework optimezed for process more request and has a rapid serialization of result (and is a assyncronous framework that permit make I/O non blocking). this in the theory, in practise very things can change.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

laith43d picture laith43d  路  3Comments

vnwarrior picture vnwarrior  路  3Comments

iwoloschin picture iwoloschin  路  3Comments

tsdmrfth picture tsdmrfth  路  3Comments

kkinder picture kkinder  路  3Comments