Aiohttp: response.json() incorrectly assumes some json responses are text. (from github raw links to json files)

Created on 29 Mar 2017  路  7Comments  路  Source: aio-libs/aiohttp

Long story short


Some json files (those on github and requested by their raw links) are assumed to be text when they are really json.

Expected behaviour


aoihttp to correctly identify the data in the raw github url is in fact json.

Actual behaviour


Some json files (those on github and requested by their raw links) are assumed to be text when they are really json.

Steps to reproduce

  1. create an github repo with json files.
  2. get raw links to them.
  3. use ClientSession.get on them.
  4. try to use r.json() on the response.

Your environment


Python 3.6.0rc1
Windows 7 Ultimate SP1 build 7601

Example that can be reporduced

version that fails:

import aiohttp
import asyncio


class HTTPClient:
    """test HTTP client."""
    def __init__(self, loop=None):
        self.loop = loop if loop is not None else asyncio.get_event_loop()
        self.session = aiohttp.ClientSession(loop=self.loop)

    @asyncio.coroutine
    def request(self, *args, **kwargs):
        yield from self.session.request(*args, **kwargs)

    @asyncio.coroutine
    def get(self, *args, **kwargs):
        data = yield from self.session.get(*args, **kwargs)
        return data

    def close(self):
        self.session.close()


async def get_json(loop=None):
    session = HTTPClient(loop=loop)
    data = await session.get("https://raw.githubusercontent.com/AraHaan/DecoraterBot-Plugins/master/pluginlist.json")
    r = await data.json()  # aiohttp fails here.
    print(r)
    session.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(get_json(loop=loop))

version that does not complain but is of incorrect type for me (not an json object that I would need to get):

import aiohttp
import asyncio


class HTTPClient:
    """test HTTP client."""
    def __init__(self, loop=None):
        self.loop = loop if loop is not None else asyncio.get_event_loop()
        self.session = aiohttp.ClientSession(loop=self.loop)

    @asyncio.coroutine
    def request(self, *args, **kwargs):
        yield from self.session.request(*args, **kwargs)

    @asyncio.coroutine
    def get(self, *args, **kwargs):
        data = yield from self.session.get(*args, **kwargs)
        return data

    def close(self):
        self.session.close()


async def get_json(loop=None):
    session = HTTPClient(loop=loop)
    data = await session.get("https://raw.githubusercontent.com/AraHaan/DecoraterBot-Plugins/master/pluginlist.json")
    r = await data.text()
    print(r)
    session.close()

loop = asyncio.get_event_loop()
loop.run_until_complete(get_json(loop=loop))

Maybe what there needs to be is something to check if the response data looks to be json and if so make it so I can use r.json() on the response.

outdated

Most helpful comment

If others end up here from a search, the relevant docs are at https://docs.aiohttp.org/en/stable/client_advanced.html#disabling-content-type-validation-for-json-responses

Disabling content type validation for JSON responses

The standard explicitly restricts JSON Content-Type HTTP header to application/json or any extended form, e.g. application/vnd.custom-type+json. Unfortunately, some servers send a wrong type, like text/html.

This can be worked around in two ways:

  1. Pass the expected type explicitly (in this case checking will be strict, without the extended form support, so custom/xxx+type won鈥檛 be accepted):

await resp.json(content_type='custom/type').

  1. Disable the check entirely:

await resp.json(content_type=None).

All 7 comments

Could you post aiohttp exception?

yes, it is:

Traceback (most recent call last):
  File "E:\Users\Elsword\Desktop\test shell\\test_pluginslist.py", line 32, in <module>
    loop.run_until_complete(get_json(loop=loop))
  File "E:\Python360\lib\asyncio\base_events.py", line 466, in run_until_complete
    return future.result()
  File "E:\Users\Elsword\Desktop\test shell\\test_pluginslist.py", line 27, in get_json
    r = await data.json()
  File "E:\Python360\lib\site-packages\aiohttp\client_reqrep.py", line 686, in json
    headers=self.headers)
aiohttp.client_exceptions.ClientResponseError: 0, message='Attempt to decode JSON with unexpected mimetype: text/plain; charset=utf-8'
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x002F1FF0>

So, I am not sure what aiohttp could do in this case. You can pass allowed content-type to json method

yeah adding content_type='text/plain' to r.json call seems to fix this. I think this might need to be documented for those who needs to use it for reading json files on raw github links to json objects without saving the data as a file.

It is documented

If others end up here from a search, the relevant docs are at https://docs.aiohttp.org/en/stable/client_advanced.html#disabling-content-type-validation-for-json-responses

Disabling content type validation for JSON responses

The standard explicitly restricts JSON Content-Type HTTP header to application/json or any extended form, e.g. application/vnd.custom-type+json. Unfortunately, some servers send a wrong type, like text/html.

This can be worked around in two ways:

  1. Pass the expected type explicitly (in this case checking will be strict, without the extended form support, so custom/xxx+type won鈥檛 be accepted):

await resp.json(content_type='custom/type').

  1. Disable the check entirely:

await resp.json(content_type=None).

Thank you very much, on behalf of all my fellow google travelers.

This was exactly what I needed and I blew right past in in the docs.

Was this page helpful?
0 / 5 - 0 ratings