When using binary string responses under Python 3, the encoding defaults to UTF8 which is incompatible with binary data. This looks to stem from Python 3 having str default to UTF8 and a new type called bytes which represents binary data. This is a significant departure from functionality in Python 2 and you can read more about it.
This issue does not exist in Python 2.
Swagger Revision: 18262c1b6132a28438e2c3d9426458c413f011f8 (master as of right now)
Response schema:
schema:
type: string
format: binary
Example Error:
Traceback (most recent call last):
File "./dr.py", line 13, in <module>
"document_type": "pdf", # pdf or xls or xlsx
File "/usr/local/lib/python3.5/site-packages/docraptor/apis/doc_api.py", line 203, in create_doc
callback=params.get('callback'))
File "/usr/local/lib/python3.5/site-packages/docraptor/api_client.py", line 322, in call_api
response_type, auth_settings, callback)
File "/usr/local/lib/python3.5/site-packages/docraptor/api_client.py", line 149, in __call_api
post_params=post_params, body=body)
File "/usr/local/lib/python3.5/site-packages/docraptor/api_client.py", line 358, in request
body=body)
File "/usr/local/lib/python3.5/site-packages/docraptor/rest.py", line 208, in POST
body=body)
File "/usr/local/lib/python3.5/site-packages/docraptor/rest.py", line 171, in request
r.data = r.data.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
I'm not sure what a good general fix would look like but I have a work-around for our use by delaying byte decoding until right before JSON deserialization and short-circuiting str. You can see my fix directly on generated code.
Related Issues and PRs:
@jqr thanks for the details. May I know if you've cycle to contribute the fix with a test case using this endpoint (the method name is automatically generated by codegen since it's empty)?
I'm not entirely sure the fix is generic enough to be used everywhere, as I only tested our use cases.
If someone else wants to run with this, they should as I won't have the time to address this in a generic fashion for a few weeks at least.
@jqr please take your time. We've not heard anyone else requesting this feature in Python yet.
Binary file responses are also being parsed as utf-8, causing errors. Does anyone know a temporary workaround for this?
schema:
type: file
format: binary
@JHibbard What about simply model the response as just type: file without format: binary?
The server sends bytes and client wants to decode it with utf8.
I've resolved it with removing the line: r.data = r.data.decode('utf8') in the rest.py file.
But then in the api_client.py file, in function __deserialize_file(self, response) it wants to write the response.data in an text file.
So I've edited this also, so the function opens the file as "wb":
with open(path, "wb") as f:
f.write(response.data)
But I don't know that what will now not work, but at least the file download works for me.
@wing328 I see the same happening in python 2.7, and I think is a more generic problem that binary files are not written to disk correctly because they may be corrupted by the "w" mode instead of "wb"
as pointed out by @microo8 .
I think it would be formally more correct to have wb instead since both text and binary file are expected to be deserialized in here.
@g-bon I wonder if you can submit a PR with the suggested fix so that we can review more easily (remember to update the Python Petstore sample so that CIs can run the integration tests)
@wing328, switching to python 3 I realized that there are actually two separate issues.
The first one is what I fixed with the PR 6936 and it affected both 2 and 3
The second one to my understanding is that in rest.py if PY3 is used and _preload_content is True, the response contents is always decoded as utf8. Which doesn't make sense for non-text files.
if six.PY3:
r.data = r.data.decode('utf8')
Happy to work on this when I have some confirmation the problem is here / feedback on the right approach.
Up
cc @kenjones-cisco , @wing328
For my case, I'm returning a geojson file. This means that it _is_ able to be decoded to utf8, but this will fail with the __deserialize_file method since that opens the file for binary writing. My solution was to remove the decode from "rest.moustache" (as was suggested), and move it without modification to after the file write conditionally happens in "api_client.moustache" (which was reasonable for my use case -- it's either a file or json/text.
def deserialize(self, response, response_type):
"""Deserializes response into an object.
:param response: RESTResponse object to be deserialized.
:param response_type: class literal for
deserialized object, or string of class name.
:return: deserialized object.
"""
# handle file downloading
# save response body into a tmp file and return the instance
if response_type == "file":
return self.__deserialize_file(response)
########### Moved from rest.mustache
if six.PY3:
response.data = response.data.decode('utf8')
logger.debug("response body: %s", response.data)
###########
# fetch data from response object
try:
data = json.loads(response.data)
except ValueError:
data = response.data
return self.__deserialize(data, response_type)
@rddesmond thanks for the great workaround!
From what I see the issue is not in skipping some particular Content-Type like in #9980 (which will fail for example for application/pdf, images etc), or having some hardcoded 'file' type which will trigger the different response processing logic.
The real issue is that swagger-codegen drops the format: binary from the model for type: string. If you'll check form parameters processing with debugOperations, you'll see the x-is-binary in vendorExtensions, which can be used in the template. But the response parameter doesn't have it, making it impossible to differentiate between strings and binary data.
Most helpful comment
For my case, I'm returning a geojson file. This means that it _is_ able to be decoded to utf8, but this will fail with the
__deserialize_filemethod since that opens the file for binary writing. My solution was to remove the decode from "rest.moustache" (as was suggested), and move it without modification to after the file write conditionally happens in "api_client.moustache" (which was reasonable for my use case -- it's either a file or json/text.