Flask: send_file fails when filename contains unicode symbols

Created on 18 Dec 2014  路  38Comments  路  Source: pallets/flask

Hi.

I've detected an issue with supporting unicode filenames in send_file.
If send_file attempts to respond with utf-8 in http headers, the answer is empty, the log contains something like "http-headers should containbe latin-1".
I know that browser support IS A MESS, but it seems, that sending two filenames (filename= and filename*=) separated by semicolon should work.

I'd like this to be handled by flask or werkzeug. Will you accept such pull request?

Most helpful comment

Similar discussion has already happened regarding this observation. See the issue from requests to read more about the details, but the short of it is that HTTP headers are supposed to only accept latin-1 characters. Changing this behavior in flask or werkzeug would only cause issues and break running code by allowing requests to be sent in an unsupported encoding to webservers which expect to get latin-1.

Someone else can chime in if I am mistaken, but I think this can be closed.

Similar discussion:
https://github.com/kennethreitz/requests/issues/1926
https://github.com/jakubroztocil/httpie/issues/212

All 38 comments

Similar discussion has already happened regarding this observation. See the issue from requests to read more about the details, but the short of it is that HTTP headers are supposed to only accept latin-1 characters. Changing this behavior in flask or werkzeug would only cause issues and break running code by allowing requests to be sent in an unsupported encoding to webservers which expect to get latin-1.

Someone else can chime in if I am mistaken, but I think this can be closed.

Similar discussion:
https://github.com/kennethreitz/requests/issues/1926
https://github.com/jakubroztocil/httpie/issues/212

I suppose the problem here is that it's not clear which encoding send_file takes for the filename. Latin-1 makes sense, but people might assume it's UTF-8.

I updated the API documentation to make that explicit, is that sufficient?

BTW, I've solved the problem by sending filename*=utf-8'blah' header field.
Works everywhere except for the IE <= 8.

Here is the snippet:

response = flask.make_response(flask.send_file(pdf_full_path))
response.headers["Content-Disposition"] = \
    "attachment; " \
    "filenane={ascii_filename};" \
    "filename*=UTF-8''{utf_filename}".format(
    ascii_filename="book.pdf",
    utf_filename=urlparse.quote(os.path.basename(pdf_full_path))
)

Looks like a more correct way of fixing this.

From the research I did I am surprised that worked for you actually, but either way I'm not sure if we should be suggesting that workaround because the encoding doesn't seem to be universally supported.

I'm not really worrying about universal browser support.
Here are the support tables for the feature.
http://greenbytes.de/tech/tc2231/

As far, as I understand, providing both filename and filename* should be more compatible. A library for transliteration / unaccenting will be required though.

From that link:

In cases where it's acceptable to fall back to an ASCII filename in both Safari (versions prior to 6) and Internet Explorer (versions prior to IE9), the recommendation in Section D of RFC 6266 should be followed

I'm happy to close the PR if you want to instead work on supporting this? But I'm unsure whether it is so necessary to not follow the recommendation. I think someone should chime in here though, @untitaker.

I think that PR definitely should be used, at least as some kind of hotfix.
I've no time to work on this right now. Some time later, maybe. Don't close the issue.

@untitaker any opinion on this?

Looking this over again and I want to close the issue and merge #1371. I think worrying about universal browser support is a good idea and it doesn't make much sense to make this change if the recommended practice is to use Latin-1 unless necessary to do otherwise.

I've found the unidecode library providing the required function:

response.headers["Content-Disposition"] = \
"attachment;" \
"filename={ascii_filename};" \
"filename*=UTF-8''{utf_filename}".format(
ascii_filename=unidecode.unidecode(basename).replace(' ', '_'),
utf_filename=urlparse.quote(basename)
)

@georgthegreat, thank you. Although the question for me is not whether we can provide the suggested function, but rather it is worth disregarding universal browser support for the enhancement.

I thought that provided example should not break any compatibility.
Do you have IE6 for checking? I have the following example here:
https://bib.hda.org.ru/bib/books/morley_1896_german/pdf/1 (pdf, 23 mb in size).

UPDATE: This is a wrong approach. It causes security warning to be shown in modern Chromium-based browsers. I don't know, which security problems are caused by multiple filename specification, but this is very serious regression of approach I've suggested.

@georgthegreat Thanks for the update. I'll merge the PR then.

You sure you wanted to merge this?

@untitaker, I just commented on the PR, I meant to ask for a review actually...

Alright, the docs are now merged. @georgthegreat I don't think any other fix is possible if you're passing bytestrings. If you're passing unicode, the situation looks different and there is a real bug somewhere.

@untitaker I was passing unicode before the fix.
Later I've changed this to two filenames specifications. This works well for IE and Firefox, but causes stupid warning in Chromium.
Since Chromium-based browsers are much more widespread, I've decided to drop IE support.

I don't think there is a bug in Flask. Chromium seems the candidate to me.

On Fri, Jun 05, 2015 at 10:44:30PM -0700, Yuriy Chernyshov wrote:

@untitaker I was passing unicode before the fix.

To clarify: I meant _unicode strings_, not UTF-8 encoded bytestrings or
generally strings with unicode characters in them.

Later I've changed this to two filenames specifications. This works well for IE and Firefox, but causes stupid warning in Chromium.
Since Chromium-based browsers are much more widespread, I've decided to drop IE support.

Could you give me a testapp that does exactly what you did?

I'm using Python3, so there is no utf-8 encoded bytestrings easily available.
Yes, I can work on MWE, but I'll do it a bit later.

Here is the mwe of a problem described in the issue. I didn't added my workarounds to it, since none of them is working properly.

#!/usr/bin/env python3
import flask

app = flask.Flask(__name__)

@app.route("/send_file")
def send_file():
    return flask.send_file(
        "mwe.py",
        as_attachment=True,
        attachment_filename="褉褍褋褋泻懈械斜褍泻胁褘.py"
    )

if __name__ == "__main__":
    app.debug = True
    app.run(host="0.0.0.0")

Ok, I've educated myself about this and I'd suggest that Flask should not only not support this, but _actively prevent_ the user from sending non-ASCII filenames IMO.

While non-ascii filenames are a bit ugly, they are VERY popular, especially among people who just use some webapp and don't do techie stuff. Imagine some secretaries (or other non-tech-savvy people) uploading a file to a web application and downloading it again later. They'd probably be very confused why it got renamed for no apparent reason (especially if not done in a smart way such as converting to ae).

My point is rather that there is no way to support the full utf-8 range in a consistent way across browsers, and the user might encounter worse usability bugs (I got a blank screen instead of a download) than a wrong filename.

Hm.. use something like translitcodec to ascii-fy unicode filenames but add a config option to disable this?

Most important thing is to make the developer aware of this issue, but we don't have to hold their hand through this. What I propose is an easily-to-disable error message, if the dev knows about those issues, they can add custom headers themselves for their specific target browsers.

Of course this would be a huge breakage for existing code, but it's at least not unnoticed and still can be easily disabled. Existing apps can continue being broken, but new apps don't have to be.

I realize that this change is probably going to cause a lot of controversity if we're actually going with it, so I'd like some more feedback. We don't really have a deadline for 1.0 anyway.

Actually, I've found RFC suggesting that Chromium behaviour is wrong:
http://greenbytes.de/tech/webdav/rfc5987.html#rfc.section.4.2

Some outdated test cases can be found here:
https://greenbytes.de/tech/tc2231/

If you can use as_attachment=False then using

@route('/attachment/<int:attachment_id>/<filename>', methods=['GET'])

would work around this problem. I've tested this with Russian, Chinese and Hebrew filenames.

See this StackOverflow answer for details.

Looking at this for PyCon sprints. Cc: @ThiefMaster

It's not clear based on @untitaker's comments what the behavior should be. If I read it correctly:

If attachment_filename is non-ASCII, then emit a warning from send_file().
Unless the user passes a (new) flag to suppress the warning.

Cc @mitsuhiko

Maybe this could be addressed somehow? utf-8 filenames are the norm nowadays.

I'm using the following code to send unicode filenames.
It works fine with any modern browser (AFAIK):
https://github.com/hda-technical/dancebooks/blob/master/www/main.py#L310

handle this by

        response = make_response(send_file(out_file))
        basename = os.path.basename(out_file)
        response.headers["Content-Disposition"] = \
            "attachment;" \
            "filename*=UTF-8''{utf_filename}".format(
                utf_filename=quote(basename.encode('utf-8'))
            )
        return response

@tkisme Could you please confirm that your solution is working without any problems of compatibility with any browser? I really would like to get a perma fix to this problem

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidism picture davidism  路  3Comments

rochacbruno picture rochacbruno  路  3Comments

sungjinp11 picture sungjinp11  路  3Comments

rkomorn picture rkomorn  路  3Comments

dreampuf picture dreampuf  路  3Comments