Fastapi: [QUESTION] Is this the correct way to save an uploaded file ?

Created on 11 Aug 2019  Â·  18Comments  Â·  Source: tiangolo/fastapi

Hello,
i am trying to save an uploaded file to disk, the following code works correctly but i wonder if it is the correct way to do that.

def parse(file: UploadFile = File(...)):

    extension = os.path.splitext(file.filename)[1]
    _, path = tempfile.mkstemp(prefix='parser_', suffix=extension)

    with open(path, 'wb') as f:
        f.write(file.file.read())

Basically i need to get the path of the temp file.
I also have tried with .rollover() but file.file.name does not return the path (only the file descriptor)

question

Most helpful comment

@classywhetten FastAPI has almost no custom logic related to UploadFile -- most of it is coming from starlette. So if you want a "save" wrapper or a better example of usage, it probably makes sense to ask in the starlette repo.

@damianoporta This line:

_, path = tempfile.mkstemp(prefix='parser_', suffix=extension)

causes the file with location path to actually be opened by python, which is what @vitalik was referring to when he said you need to close the tempfile handle (which is the thing named _ in your code).


Here are some utility functions that the people in this thread might find useful (at least as a starting point):

import shutil
from pathlib import Path
from tempfile import NamedTemporaryFile
from typing import Callable

from fastapi import UploadFile


def save_upload_file(upload_file: UploadFile, destination: Path) -> None:
    try:
        with destination.open("wb") as buffer:
            shutil.copyfileobj(upload_file.file, buffer)
    finally:
        upload_file.file.close()


def save_upload_file_tmp(upload_file: UploadFile) -> Path:
    try:
        suffix = Path(upload_file.filename).suffix
        with NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
            shutil.copyfileobj(upload_file.file, tmp)
            tmp_path = Path(tmp.name)
    finally:
        upload_file.file.close()
    return tmp_path


def handle_upload_file(
    upload_file: UploadFile, handler: Callable[[Path], None]
) -> None:
    tmp_path = save_upload_file_tmp(upload_file)
    try:
        handler(tmp_path)  # Do something with the saved temp file
    finally:
        tmp_path.unlink()  # Delete the temp file

(I haven't tested the above functions, just adapted them from slightly more complex functions in my own code.)

Note: you'd want to use the above functions inside of def endpoints, not async def, since they make use of blocking APIs.

I don't know whether it handles out-of-memory issues reasonably well or not, but that is not an issue in my application -- maximum request size is capped elsewhere, and the volume of concurrent requests I'm currently handling is such that OOM isn't really an issue.

All 18 comments

1)

It's not clear why you making a tempfile - UploadFile already does that under the hood if file size is larger then some configured amount.

2)
file.file.read() - this line can "eat" your entire memory if uploaded file size is larger than free memory space

you would need to read(chunk_size) in loop and write chunks to the copy

@vitalik i must save the file on the disk. I know UploadFile stores the file if it is larger than free mem space but i have to save all the files i receive.

well my point is - why you use tempfile ?...

from you code looks like you do not close the tempfile handle correctly (your "_" variable)

read more - https://www.logilab.org/blogentry/17873

@vitalik any suggestions of a good chunk size?

@vitalik any suggestions of a good chunk size?

to solve the out of memory issue the maximum chunk size should be:

chunk_size = free_ram / number_of_max_possible_concurent_requests

but these days 10kb should be enough for any case I guess

@vitalik because i have another library that waits a path. I use textract to read the content of the files (.pdf, .doc etc) Unfortunately i cannot pass an already opened file, so i thought to save it and then pass the path to that library.

@vitalik because i have another library that waits a path. I use textract to read the content of the files (.pdf, .doc etc) Unfortunately i cannot pass an already opened file, so i thought to save it and then pass the path to that library.

then my points stays the same:

1) you need correctly close tempfile handle

2) you need to prevent out of memory if it is relevant in your case

@vitalik because i have another library that waits a path. I use textract to read the content of the files (.pdf, .doc etc) Unfortunately i cannot pass an already opened file, so i thought to save it and then pass the path to that library.

then my points stays the same:

1. you need correctly close tempfile handle

2. you need to prevent out of memory if it is relevant  in your case

Ok, something like this:


def parse(file: UploadFile = File(...)):
    extension = os.path.splitext(file.filename)[1]
    _, path = tempfile.mkstemp(prefix='parser_', suffix=extension)

    with open(path, 'ab') as f:
        for chunk in iter(lambda: file.file.read(10000), b''):
            f.write(chunk)

    # extract content
    content = textract.process(path)

    # remove temp file
    os.close(_)
    os.remove(path)

I'm curious how best to store uploaded files too, flask has this for example:

@app.route('/', methods=['GET', 'POST'])
def upload_file():
        file = request.files['file']
        filename = secure_filename(file.filename)
        file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
        return redirect(url_for('uploaded_file',
                                    filename=filename))

http://flask.palletsprojects.com/en/1.0.x/patterns/fileuploads/

I've been digging through Flask's documentation, and their file.save function is a wrapper around shutils.copyfileobj() in the standard library.

Here's my working code:

@app.post("/files/upload")
def create_file(file: UploadFile = File(...)):
    global upload_folder
    file_object = file.file
    #create empty file to copy the file_object to
    upload_folder = open(os.path.join(upload_folder, file.filename), 'wb+')
    shutil.copyfileobj(file_object, upload_folder)
    upload_folder.close()
    return {"filename": file.filename}

I would much appreciate a wrapper around this be built into fastapi

I would much appreciate a wrapper around this be built into fastapi

I'd be tempted to say it would out of the scope of the library given the number of ways this could be implemented, obviously you found a nice working solution and this is fine, but what if for instance you wanted the file.save had to be async, what if you wanted to save in memory instead of disk, etc...

there are so many ways to implement a file save, each one being a good one if it fits your requirements, that I don't see a good generic way for this, but maybe it's simpler than it looks, maybe that's just me liking he library to be not opinionated when it comes to those "goodies".

It's an interesting debate nonetheless. I think having questions like this, answered like you did gives people ideas and solutions and are very efficient.

Perhaps then the solution to making sure users/developers know how to do
persistent storage correctly would be to include something like this
function in the documentation?

I'm just getting started with API design, but persistent file
storage/uploads seem to be somewhat common. In keeping fastapi
non-opinionated: maybe rather than including a wrapper for shutils,
including examples of different ways of using the uploaded file in common
ways that applications would use.

On Wed, Oct 16, 2019, 12:54 AM euri10 notifications@github.com wrote:

I would much appreciate a wrapper around this be built into fastapi

I'd be tempted to say it would out of the scope of the library given the
number of ways this could be implemented, obviously you found a nice
working solution and this is fine, but what if for instance you wanted the
file.save had to be async, what if you wanted to save in memory instead of
disk, etc...

there are so many ways to implement a file save, each one being a good one
if it fits your requirements, that I don't see a good generic way for this,
but maybe it's simpler than it looks, maybe that's just me liking he
library to be not opinionated when it comes to those "goodies".

It's an interesting debate nonetheless. I think having questions like
this, answered like you did gives people ideas and solutions and are very
efficient.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/tiangolo/fastapi/issues/426?email_source=notifications&email_token=AJIRQ35LWTSQYOWCTNESQGTQO23CVA5CNFSM4IK4APV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBLKW5I#issuecomment-542550901,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AJIRQ374HTSL3O7EH3IBDS3QO23CVANCNFSM4IK4APVQ
.

@classywhetten FastAPI has almost no custom logic related to UploadFile -- most of it is coming from starlette. So if you want a "save" wrapper or a better example of usage, it probably makes sense to ask in the starlette repo.

@damianoporta This line:

_, path = tempfile.mkstemp(prefix='parser_', suffix=extension)

causes the file with location path to actually be opened by python, which is what @vitalik was referring to when he said you need to close the tempfile handle (which is the thing named _ in your code).


Here are some utility functions that the people in this thread might find useful (at least as a starting point):

import shutil
from pathlib import Path
from tempfile import NamedTemporaryFile
from typing import Callable

from fastapi import UploadFile


def save_upload_file(upload_file: UploadFile, destination: Path) -> None:
    try:
        with destination.open("wb") as buffer:
            shutil.copyfileobj(upload_file.file, buffer)
    finally:
        upload_file.file.close()


def save_upload_file_tmp(upload_file: UploadFile) -> Path:
    try:
        suffix = Path(upload_file.filename).suffix
        with NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
            shutil.copyfileobj(upload_file.file, tmp)
            tmp_path = Path(tmp.name)
    finally:
        upload_file.file.close()
    return tmp_path


def handle_upload_file(
    upload_file: UploadFile, handler: Callable[[Path], None]
) -> None:
    tmp_path = save_upload_file_tmp(upload_file)
    try:
        handler(tmp_path)  # Do something with the saved temp file
    finally:
        tmp_path.unlink()  # Delete the temp file

(I haven't tested the above functions, just adapted them from slightly more complex functions in my own code.)

Note: you'd want to use the above functions inside of def endpoints, not async def, since they make use of blocking APIs.

I don't know whether it handles out-of-memory issues reasonably well or not, but that is not an issue in my application -- maximum request size is capped elsewhere, and the volume of concurrent requests I'm currently handling is such that OOM isn't really an issue.

Sounds also like a good plugin opportunity. Is there a good pattern for plugins published? I’ve been thinking about creating a plugin for my project that I’m reusing in multiple services, but I’m not sure the best way forward for that.

On Oct 16, 2019, at 2:19 PM, dmontagu notifications@github.com wrote:

@classywhetten https://github.com/classywhetten FastAPI has almost no custom logic related to UploadFile -- most of it is coming from starlette. So if you want a "save" wrapper or a better example of usage, it probably makes sense to ask in the starlette repo.

@damianoporta https://github.com/damianoporta This line:

_, path = tempfile.mkstemp(prefix='parser_', suffix=extension)
causes the file with location path to actually be opened by python, which is what @vitalik https://github.com/vitalik was referring to when he said you need to close the tempfile handle (which is the thing named _ in your code).

Here are some utility functions that the people in this thread might find useful:

from pathlib import Path
import shutil
from tempfile import NamedTemporaryFile
from typing import Callable

from fastapi import UploadFile

def save_upload_file(
upload_file: UploadFile,
destination: Path,
) -> None:
with destination.open("wb") as buffer:
shutil.copyfileobj(upload_file.file, buffer)
upload_file.file.close()

def save_upload_file_tmp(
upload_file: UploadFile,
handle_file: Callable[[Path], None]
) -> None:
suffix = Path(upload_file.filename).suffix
with NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
shutil.copyfileobj(upload_file.file, tmp)
tmp_path = Path(tmp.name)
upload_file.file.close()
handle_file(tmp_path) # Do something with the saved temp file
tmp_path.unlink() # Delete the temp file
I haven't tested the above functions, just adapted them from slightly more complex functions in my own code.

I don't know whether it handles out-of-memory issues reasonably well or not, but that is not an issue in my application -- maximum request size is capped elsewhere, and the volume of concurrent requests I'm currently handling is such that OOM isn't really an issue.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub https://github.com/tiangolo/fastapi/issues/426?email_source=notifications&email_token=AACZF56BYJKZT2USGOZ4B2LQO5LL5A5CNFSM4IK4APV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBNOR5Q#issuecomment-542828790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACZF55FS3EQO3HB3GAXKVTQO5LL5ANCNFSM4IK4APVQ.

Yep, as @dmontagu probably the best way to do it is with shutil.copyfileobj() something like:

with destination.open("wb") as buffer:
    shutil.copyfileobj(upload_file.file, buffer)

shutil and pathlib (both part of the standard library) can help a lot solving most of the cases, I think.

@wshayes I think this specific problem might depend too much on each use case to have a plugin that works for _most_ of the cases and requirements... but still, if you want to develop a reusable package (that integrates with FastAPI or not), I would suggest Poetry or Flit. They make it very easy to create a package and publish it.

Assuming the original issue was solved, it will be automatically closed now. But feel free to add more comments or create new issues.

I was having issues with this problem, and found out the hard way I needed to seek(0) the uploaded file first:

~python
uploaded_file.file.seek(0) # <-- This.
with destination.open("wb") as buffer:
shutil.copyfileobj(upload_file.file, buffer)
~

Hello @classywhetten, I liked your solution it works perfectly I am new at fastapi and wanted to know how to view/download that uploaded image/file using your code above? Thanks in advance.

Was this page helpful?
0 / 5 - 0 ratings