When iterating over Content-Disposition header, file names are processed byte by byte, so wide characters often (if not always) trigger early parsing quit.
There are 2 variations for Content-Disposition header field.
Content-Disposition: Attachment; filename=example.htmlIn this case, filename parameter can carry iso8859-1 characters only. No multi byte characters are allowed.
Content-Disposition: attachment; filename*= UTF-8''%e2%82%ac%20ratesThis uses RFC 5987, can specify encodings. In the above example, UTF-8 is used.
aria2 supports UTF-8 and iso8859-1 just described in RFC 6266.
I realize this probably violates RFC, but look for example at this page.
If we check Content-Disposition header of first photo, we will see:
$ curl -I http://cfile9.uf.tistory.com/original/142D493F506557F31182F6
HTTP/1.1 200 OK
Expires: Tue, 22 Mar 2016 10:44:02 GMT
Date: Sun, 21 Feb 2016 10:44:01 GMT
Server: Apache
Content-Disposition: inline; filename="%EC%82%AC%EB%B3%B8_-DSC02499.jpg"
Last-Modified: Fri, 28 Sep 2012 07:55:33 GMT
Content-Type: image/jpeg
Content-Length: 792416
Age: 198
Via: 1.1 Wcache(2.0)
Connection: keep-alive
>>> import urllib.parse
>>> urllib.parse.unquote('%EC%82%AC%EB%B3%B8_-DSC02499.jpg')
'靷掣_-DSC02499.jpg'
So it uses UTF-8 encoded Content-Disposition filename, but doesn't specify encoding. I think such cases might be rather common in real web? What about additional option for this, e.g. --content-disposition-encoding=utf8?
I'm not sure whether it is common or not. We have implemented several features which violates RFC in order to support "real" web. So this could be one of those things.
But I'd like to get away from encoding issue as long as I can, because it is nasty thing and it is hard to get it right.
I myself have no plan to add this feature, but we are ready to accept a patch if someone is really interested in this.