Description
I'm trying to upload csv file to fastapi server. Is there any clear way to read fastapi File object and write it as a csv file?
When I receive byte stream through the api, it's difficult to save it as a csv file since there could be comma anywhere in the string.
One solution could be write incoming bytes files to disk and load it back but I'm wondering if there is a direct approach without writing it into a disk
there is a whole section in the docs about uploading files: https://fastapi.tiangolo.com/tutorial/request-files/
if you want to avoid writing to the disk you can use the io module from standard library: https://docs.python.org/3/library/io.html
edit : just note the io module is not async
To add to @euri10 's comment, I think you will probably want to make use of UploadFile to receive the csv file (as described in the docs @euri10 linked). This will transparently handle writing the file to disk if it is too large to store in memory, and has both a sync and async API for saving to disk if desired.
Moreover, If you want to load the csv data _without_ saving to disk, I believe it should be possible to pass it directly to pandas.read_csv via:
import pandas as pd
@app.post("/uploadcsv/")
def upload_csv(csv_file: UploadFile = File(...)):
dataframe = pd.read_csv(csv_file.file)
# do something with dataframe here (?)
return {"filename": file.filename}
(I haven't tested this, but it seems like it should be supported based on the pandas docs. Other csv readers may require other modifications.)
Great answers @euri10 and @dmontagu ! Thanks a lot for your help here! :cake:
@enod yep, right what they said :smile: :point_up:
Thanks a lot @dmontagu for the help and it solved the issue quickly :+1: :100:
@app.post("/uploadfile")
async def check(file: UploadFile = File(...)):
if file.filename.endswith('.csv'):
df = pd.read_csv(file.file)
df_head = df.head(5)
df_tail = df.tail(5)
# cli.set("userId",str(df))
head = df_head.to_json(orient='records')
tail = df_tail.to_json(orient='records')
head = eval(head)
tail = eval(tail)
# create_db_table()
insert_data(uuid=1,itemname=file.file.read())
# final return
return {
"head":head,
"tail":tail
}
What if the CSV has ; or <tab> as delimiter?
EDIT:
Lost a lot of time on it. :cry:
I'm not sure if this is a potential FastAPI/Starlette issue, or it's from the Pandas side, but look:
if excel.content_type == "text/csv":
file_ = deepcopy(excel.file)
for sep in CSV_SEPARATORS:
file_ = deepcopy(excel.file)
try:
data = pd.read_csv(file_, sep=sep)
except pd.errors.EmptyDataError:
continue
Without the deepcopy is not possible to do it. It means that somehow pandas is modifying the object inside, maybe because it uses as an iterator and iterates over it, then it doesn't iterate anymore after it... I tried using a separator regex, but didn't work as well, I have a ValueError exception that I cannot solve as well.
The default file object is a SpooledTemporaryFile, which means that it lives by default in memory and then is put on disk if it's too large. I think it might be a good idea to save the incoming file to a temporary file to make sure it's not in memory that can be lost before processing it further.