Requests: HeaderParsingError: Failed to parse headers

Created on 15 Apr 2016  路  22Comments  路  Source: psf/requests

INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 127.0.0.1
DEBUG:requests.packages.urllib3.connectionpool:"POST /kkblog/ HTTP/1.1" 201 None
WARNING:requests.packages.urllib3.connectionpool:Failed to parse headers (url=http://127.0.0.1:5984/kkblog/): [MissingHeaderBodySeparatorDefect()], unparsed data: '鲁茅\x97\xad\r\nETag: "1-967a00dff5e02add41819138abb3284d"\r\nDate: Fri, 15 Apr 2016 14:45:18 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 69\r\nCache-Control: must-revalidate\r\n\r\n'
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 390, in _make_request
    assert_header_parsing(httplib_response.msg)
  File "/usr/lib/python3.5/site-packages/requests/packages/urllib3/util/response.py", line 59, in assert_header_parsing
    raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
requests.packages.urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: '鲁茅\x97\xad\r\nETag: "1-967a00dff5e02add41819138abb3284d"\r\nDate: Fri, 15 Apr 2016 14:45:18 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 69\r\nCache-Control: must-revalidate\r\n\r\n'

here is the same request with curl:

curl -v -X POST 127.0.0.1:5984/kkblog/ -H "Content-Type: application/json" -d '{"_id": "鍏抽棴"}'
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
> POST /kkblog/ HTTP/1.1
> Host: 127.0.0.1:5984
> User-Agent: curl/7.47.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 17
> 
* upload completely sent off: 17 out of 17 bytes
< HTTP/1.1 201 Created
< Server: CouchDB/1.6.1 (Erlang OTP/18)
< Location: http://127.0.0.1:5984/kkblog/鍏抽棴
< ETag: "3-bc27b6930ca514527d8954c7c43e6a09"
< Date: Fri, 15 Apr 2016 15:13:14 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 69
< Cache-Control: must-revalidate
< 
{"ok":true,"id":"鍏抽棴","rev":"3-bc27b6930ca514527d8954c7c43e6a09"}
* Connection #0 to host 127.0.0.1 left intact

the problem is Location: http://127.0.0.1:5984/kkblog/鍏抽棴 in the response header, I tried other Chinese chars but they didn't cause Exception.

>>> '鍏抽棴'.encode('utf-8')
b'\xe5\x85\xb3\xe9\x97\xad'
>>> 

Most helpful comment

@fake-name I don't recall being puritanical about _anything_. Here is, word for word, what I said (literally quoting myself from this thread):

The header parsing is done by httplib, in the Python standard library; that is the part that failed to parse. The failure to parse is understandable though: servers should not be shoving arbitrary bytes into headers.

Using UTF-8 for your headers is extremely unwise, as discussed by RFC 7230:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII].

In this instance it's not really possible for us to resolve the problem. The server should instead be sending urlencoded URLs, or RFC 2047-encoded header fields. Either way, httplib is getting confused here, and we can't really step in and stop it.

Note the key part of this comment: "Either way, httplib is getting confused here, and we can't really step in and stop it.".

This is what I mean when I say "it's not really possible for us to resolve the problem". The issue here is in a helper library that sits in the Python standard library. Changing the header parsing logic of that standard library module, while _possible_, is something that needs to be done as part of the standard CPython development process. Requests already carries more subclasses and monkeypatches to httplib than we're happy with, and we're strongly disinclined to carry more.

So here are the options for resolving this issue:

  1. File a bug with the CPython development team, get the bug fixed and into Python 3.6 or 3.7, upgrade to that Python version.
  2. Work out what the minimal possible invasive change is to httplib in order to allow parsing UTF-8 headers, propose that patch to us to see what we think of it.
  3. Write a patch that removes Requests' requirement to use httplib at all so that the next time we have a problem that boils down to "httplib is stupid", we don't have to have this argument again.

Now, _you_ are welcome to pursue any of those options, but I've been to this rodeo a few times so I'm pursuing (3), which is the only one that actually makes this problem go away for good. Unfortunately, it turns out that replacing our low-level HTTP stack that we have spent 7 years integrating with takes quite a lot of work, and I can't just vomit out the code to fix this on demand.

To sum up: I didn't say I didn't think this was a problem or a bug, I said it was a problem that the Requests team couldn't fix, at least not on a timescale that was going to be helpful to this user. If you disagree, by all means, provide a patch to prove me wrong.


And let me make something clear. For the last 9 months or so I have been the most active Requests maintainer by a long margin. Requests is not all I do with my time. I maintain 15 other libraries and actively contribute to more. I have quite a lot of stuff I am supposed to be doing. So I have to _prioritise_ my bug fixing.

Trust me when I say that a bug where the effort required to fix it is extremely high and the flaw comes from a server that is emitting non-RFC-compliant output, that's not a bug that screams out "must be fixed this second". Any time there is a bug predicated on the notion that our peer isn't spec compliant that bug drops several spaces down my priority list. Postel was wrong.

Browsers are incentivised to support misbehaving servers because they are in a competitive environment, and users only blame them when things go wrong. If Chrome doesn't support ${CRAPPY_WEBSITE_X} then Chrome users will just go to a browser that does when they need access.

That's all fine and good, but the reason that Requests doesn't do this is because we have _two_ regular developers. That's it. There are only so many things two developers can do in a day. Neither of us work on just Requests. Compare this to Chrome, which has tens of full-time developers and hundreds of part-time ones. If you want Requests to work on every site where Chrome does, then I have bad news for you my friend because it's just _never_ going to happen.

I say all of this to say: please don't _berate_ the Requests team because we didn't think your particular pet bug was important. We prioritise bugs and close ones we don't think we'll fix any time soon. If you would like to see this bug fixed, a much better option is to _write the patch yourself_. Shouting at me does not make me look fondly on your request for assistance.

All 22 comments

The header parsing is done by httplib, in the Python standard library; that is the part that failed to parse. The failure to parse is understandable though: servers should not be shoving arbitrary bytes into headers.

Using UTF-8 for your headers is extremely unwise, as discussed by RFC 7230:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII].

In this instance it's not really possible for us to resolve the problem. The server should instead be sending urlencoded URLs, or RFC 2047-encoded header fields. Either way, httplib is getting confused here, and we can't really step in and stop it.

thanks

In this instance it's not really possible for us to resolve the problem. The server should instead be sending urlencoded URLs, or RFC 2047-encoded header fields. Either way, httplib is getting confused here, and we can't really step in and stop it.

So..... what about the thousands and thousands of potential servers that I _can't_ just SSH into and fix?

Basically, "just follow the RFC" is a complete non-answer, because I don't control the world (I'm taking minion applications, though!).
The fact is, servers out there serve UTF-8 headers. This is not something fixable, because they're not my server. My web browser handles this situation just fine, so it's clearly possible to make this situation work.

As it is, requests fails on these servers. This _is_ fixable, because I control the code on my local machine.

Additionally, the RFC only even ever states In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data., so UTF-8 header values are even RFC compliant!

US-ASCII is a sub-set of UTF-8, not the other way around. Characters like 鍏抽棴 are UTF-8, but are NOT US-ASCII. So no, UTF-8 is NOT RFC compliant.

[Deleted comments about RFC stuff, since it's irrelevant, due to broad use of UTF-8 headers]


Anyways, the issue of RFC compliance is irrelevant. There are servers out there that are serving UTF-8 headers, which is then broadly supported by all browser vendors.

Content-Disposition is defined in RFC2183. It states that "Short" parameters should be US-ASCII. Long Parameters should be encoded as per RFC-2184. I can't see where in RFC-2184 it says that UTF-8 encoding is valid. (But I may be missing that particular line)

@Lukasa knows much more about the relevant RFCs than I do though.

And again, whether it's compliant is irrelevant. There are servers out that that act like this. And my browser, and cURL are _completely_ happy talking to them, yet requests explodes.


Hell, I'm interacting with cloudflare, and it's serving UTF-8 headers. So basically unicode header support is massively, MASSIVELY deployed and available, standards be damned.

If you're dead set on being puritanical about RFC support, the only people who are harmed are people who want to use the requests library.

@fake-name I don't recall being puritanical about _anything_. Here is, word for word, what I said (literally quoting myself from this thread):

The header parsing is done by httplib, in the Python standard library; that is the part that failed to parse. The failure to parse is understandable though: servers should not be shoving arbitrary bytes into headers.

Using UTF-8 for your headers is extremely unwise, as discussed by RFC 7230:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII].

In this instance it's not really possible for us to resolve the problem. The server should instead be sending urlencoded URLs, or RFC 2047-encoded header fields. Either way, httplib is getting confused here, and we can't really step in and stop it.

Note the key part of this comment: "Either way, httplib is getting confused here, and we can't really step in and stop it.".

This is what I mean when I say "it's not really possible for us to resolve the problem". The issue here is in a helper library that sits in the Python standard library. Changing the header parsing logic of that standard library module, while _possible_, is something that needs to be done as part of the standard CPython development process. Requests already carries more subclasses and monkeypatches to httplib than we're happy with, and we're strongly disinclined to carry more.

So here are the options for resolving this issue:

  1. File a bug with the CPython development team, get the bug fixed and into Python 3.6 or 3.7, upgrade to that Python version.
  2. Work out what the minimal possible invasive change is to httplib in order to allow parsing UTF-8 headers, propose that patch to us to see what we think of it.
  3. Write a patch that removes Requests' requirement to use httplib at all so that the next time we have a problem that boils down to "httplib is stupid", we don't have to have this argument again.

Now, _you_ are welcome to pursue any of those options, but I've been to this rodeo a few times so I'm pursuing (3), which is the only one that actually makes this problem go away for good. Unfortunately, it turns out that replacing our low-level HTTP stack that we have spent 7 years integrating with takes quite a lot of work, and I can't just vomit out the code to fix this on demand.

To sum up: I didn't say I didn't think this was a problem or a bug, I said it was a problem that the Requests team couldn't fix, at least not on a timescale that was going to be helpful to this user. If you disagree, by all means, provide a patch to prove me wrong.


And let me make something clear. For the last 9 months or so I have been the most active Requests maintainer by a long margin. Requests is not all I do with my time. I maintain 15 other libraries and actively contribute to more. I have quite a lot of stuff I am supposed to be doing. So I have to _prioritise_ my bug fixing.

Trust me when I say that a bug where the effort required to fix it is extremely high and the flaw comes from a server that is emitting non-RFC-compliant output, that's not a bug that screams out "must be fixed this second". Any time there is a bug predicated on the notion that our peer isn't spec compliant that bug drops several spaces down my priority list. Postel was wrong.

Browsers are incentivised to support misbehaving servers because they are in a competitive environment, and users only blame them when things go wrong. If Chrome doesn't support ${CRAPPY_WEBSITE_X} then Chrome users will just go to a browser that does when they need access.

That's all fine and good, but the reason that Requests doesn't do this is because we have _two_ regular developers. That's it. There are only so many things two developers can do in a day. Neither of us work on just Requests. Compare this to Chrome, which has tens of full-time developers and hundreds of part-time ones. If you want Requests to work on every site where Chrome does, then I have bad news for you my friend because it's just _never_ going to happen.

I say all of this to say: please don't _berate_ the Requests team because we didn't think your particular pet bug was important. We prioritise bugs and close ones we don't think we'll fix any time soon. If you would like to see this bug fixed, a much better option is to _write the patch yourself_. Shouting at me does not make me look fondly on your request for assistance.

I apologize. I assumed you were holding an opinion that you were not, and proceeded to be a complete ass.

In any event, I don't disagree that this is a issue with the core library, but, well, in my experience complaining about encoding issues in the core library is non-productive (I have an issue with the built-in ftplib where it decodes some messages as iso-8859-1 _even when in utf-8 mode_, which I was only able to solve by monkey-patching the stdlib).

Anyways, Assuming you're OK with monkey patching, here's a simple snippet that monkey-patches http.client to make it much, MUCH more robust to arbitrary header encodings:

try:
    import cchardet as chardet
except ImportError:
    import chardet as chardet

import http.client
import email.parser

def parse_headers(fp, _class=http.client.HTTPMessage):
    """Parses only RFC2822 headers from a file pointer.

    email Parser wants to see strings rather than bytes.
    But a TextIOWrapper around self.rfile would buffer too many bytes
    from the stream, bytes which we later need to read as bytes.
    So we read the correct bytes here, as bytes, for email Parser
    to parse.

    Note: Monkey-patched version to try to more intelligently determine
    header encoding

    """
    headers = []
    while True:
        line = fp.readline(http.client._MAXLINE + 1)
        if len(line) > http.client._MAXLINE:
            raise http.client.LineTooLong("header line")
        headers.append(line)
        if len(headers) > http.client._MAXHEADERS:
            raise HTTPException("got more than %d headers" % http.client._MAXHEADERS)
        if line in (b'\r\n', b'\n', b''):
            break


    hstring = b''.join(headers)
    inferred = chardet.detect(hstring)
    if inferred and inferred['confidence'] > 0.8:
        print("Parsing headers!", hstring)
        hstring = hstring.decode(inferred['encoding'])
    else:
        hstring = hstring.decode('iso-8859-1')

    return email.parser.Parser(_class=_class).parsestr(hstring)

http.client.parse_headers = parse_headers

Note: This _does_ require cchardet, or chardet. I'm open to better ways to determine the encoding. It simply overrides the http.client.parse_headers() member of the stdlib, which _is_ kind of squicky.

Splatting the above into a file, and then just importing it at the beginning of the requests/__init__.py file seems to solve the problem.

So, that patch looks somewhat plausible. I should note for others skimming this thread that it only works on Python 3, but that's ok because this problem only exists on Python 3 (Python 2 doesn't insist on decoding the headers).

There are a few things we'd need to address if we wanted to move forward on this patch:

  1. It's slow. Requests includes chardet, and already cops quite a lot of flak for using it to try to guess the encoding of body content because of how slow it is. Doing a chardet check for every response we get is likely to be pretty unpleasant.
  2. It makes the assertion that every header is in the response is encoded the same way. Unfortunately I can confidently tell you that that's not true. It's extremely common to have a situation involving reverse or caching proxies where multiple entities add headers to a response, often using different character encodings. Of course, this patch isn't any _worse_ in that case than the current behaviour, but it doesn't necessarily gracefully handle it well either.
  3. Monkeypatching the stdlib is asking for trouble. It hasn't entirely stopped us in the past, but that doesn't make me happy about it.

Part of me wonders whether we can address this issue in a more sledgehammery way. For example, can we avoid the chardet requirement by asserting that we only want to try UTF-8 and ISO-8859-1? That will fail in the rare instance that someone sends a header block encoded in UTF-16, but frankly I'd argue that in that instance the server is beyond help and needs to be put out of its misery.

The other advantage of doing it that way around is that it would let us drop down and put the patch into urllib3. In addition to ensuring that more people get the opportunity to benefit from the patch, it is the more natural place to put it: urllib3 is where we do most of our interfacing with httplib. The other advantage of putting it there is that I'm currently involved in trying to _replace_ httplib in urllib3, and having the patch in urllib3 makes me much more likely to _remember_ to remove the patch once we've resolved the issue. ;)

What do you think @fake-name? Would you be open to a more reduced version of this patch getting added to urllib3 as a (hopefully) temporary measure until we can remove httplib entirely?

It's definitely a extremely clumsy fix.

It's slow. Requests includes chardet, and already cops quite a lot of flak for using it to try to guess the encoding of body content because of how slow it is. Doing a chardet check for every response we get is likely to be pretty unpleasant.

If requests is using chardet, have you looked at cChardet? It's a Cython-based api-compatible replacement for chardet, which according to it's readme page is about 1000-4000x faster then plain chardet.

It makes the assertion that every header is in the response is encoded the same way. Unfortunately I can confidently tell you that that's not true. It's extremely common to have a situation involving reverse or caching proxies where multiple entities add headers to a response, often using different character encodings. Of course, this patch isn't any worse in that case than the current behaviour, but it doesn't necessarily gracefully handle it well either.

Ugh. Why can't everything just be UTF-8?

I wonder if it's possible to only bother doing a chardet call if the string contains characters outside of the normal ASCII printable chars? Something like if any([char > 127 for char in rawheader]). That'd let the system avoid the call entirely when the string is definitively entirely plain ascii.

I can add per-header decoding after I'm home from work.

Monkeypatching the stdlib is asking for trouble. It hasn't entirely stopped us in the past, but that doesn't make me happy about it.

It's certainly pretty ugly, though this is one of the cleaner monkey-patches I've seen (they were nice enough to break out the entire header decode thing into a single function, rather then having to replace parts of a class or something). Since it looks like parse_headers _is_ part of the httplib public API (albeit an undocumented part), the function signature _should_ be pretty fixed.

I'm not too enthused about it either, but I think trying to convince python core to include cchardet in their dependencies for it is probably not exactly viable.

Part of me wonders whether we can address this issue in a more sledgehammery way. For example, can we avoid the chardet requirement by asserting that we only want to try UTF-8 and ISO-8859-1? That will fail in the rare instance that someone sends a header block encoded in UTF-16, but frankly I'd argue that in that instance the server is beyond help and needs to be put out of its misery.

The problem with putting servers out of their misery is the people who own datacenters tend to get kind of irritable when you break in and hit someone's server with a baseball-bat. And _then_ you have to deal with the person who actually owns the server.

The other advantage of doing it that way around is that it would let us drop down and put the patch into urllib3. In addition to ensuring that more people get the opportunity to benefit from the patch, it is the more natural place to put it: urllib3 is where we do most of our interfacing with httplib. The other advantage of putting it there is that I'm currently involved in trying to replace httplib in urllib3, and having the patch in urllib3 makes me much more likely to remember to remove the patch once we've resolved the issue. ;)

What do you think @fake-name? Would you be open to a more reduced version of this patch getting added to urllib3 as a (hopefully) temporary measure until we can remove httplib entirely?

I don't know the requests architecture well enough to comment coherently, though you should treat my little hack above as basically public domain (or WTFPL if you really want a actual license). If it's useful, do what you want with it.

If requests is using chardet, have you looked at cChardet?

Unfortunately, we can't depend on third-party modules, especially not if they require compilation. That rules out cython. =(

I'm pretty strongly inclined to want to ship this as a much more minor patch that tries UTF-8 and then ISO-8859-1 in that order. I suspect that will cover 98% of the problems, and will let us get a better feel for how problematic the remaining cases are.

Unfortunately, we can't depend on third-party modules, especially not if they require compilation. That rules out cython. =(

Isn't chardet a third-party module? Also, cChardet has available windows builds in PyPi for 2.7, 3.4 and 3.5.

Anyways, cchardet and chardet are API-compatible. Try to install cchardet, fall back to chardet on failure.

chardet is vendored into the requests repository.

I'm open to having cchardet as a fast-path for regular chardet though, via optional import.

So, chardet has been vastly improved (but not yet released) to significantly improve its speed. Once they get around to it, the new release should improve things for us.

Kinda hackier chardet-less version. Also does per-header decoding:

import sys
import codecs

import http.client
import email.parser

cchardet = False

try:
    import cchardet
except ImportError:
    pass

def isUTF8Strict(data):
    '''
    Check if all characters in a bytearray are decodable
    using UTF-8.
    '''
    try:
        decoded = data.decode('UTF-8')
    except UnicodeDecodeError:
        return False
    else:
        for ch in decoded:
            if 0xD800 <= ord(ch) <= 0xDFFF:
                return False
        return True

def decode_headers(header_list):
    '''
    Decode a list of headers.

    Takes a list of bytestrings, returns a list of unicode strings.
    The character set for each bytestring is individually decoded.
    '''

    decoded_headers = []
    for header in header_list:
        if cchardet:
            inferred = cchardet.detect(header)
            if inferred and inferred['confidence'] > 0.8:
                print("Parsing headers!", header)
                decoded_headers.append(header.decode(inferred['encoding']))
            else:
                decoded_headers.append(header.decode('iso-8859-1'))
        else:
            # All bytes are < 127 (e.g. ASCII)
            if all([char & 0x80 == 0 for char in header]):
                decoded_headers.append(header.decode("us-ascii"))
            elif isUTF8Strict(header):
                decoded_headers.append(header.decode("utf-8"))
            else:
                decoded_headers.append(header.decode('iso-8859-1'))

    return decoded_headers


def parse_headers(fp, _class=http.client.HTTPMessage):
    """Parses only RFC2822 headers from a file pointer.

    email Parser wants to see strings rather than bytes.
    But a TextIOWrapper around self.rfile would buffer too many bytes
    from the stream, bytes which we later need to read as bytes.
    So we read the correct bytes here, as bytes, for email Parser
    to parse.

    Note: Monkey-patched version to try to more intelligently determine
    header encoding

    """
    headers = []
    while True:
        line = fp.readline(http.client._MAXLINE + 1)
        if len(line) > http.client._MAXLINE:
            raise http.client.LineTooLong("header line")
        headers.append(line)
        if len(headers) > http.client._MAXHEADERS:
            raise HTTPException("got more than %d headers" % http.client._MAXHEADERS)
        if line in (b'\r\n', b'\n', b''):
            break

    decoded_headers = decode_headers(headers)

    hstring = ''.join(decoded_headers)

    return email.parser.Parser(_class=_class).parsestr(hstring)

http.client.parse_headers = parse_headers

I'm curious how much of a performance improvement this'd result in. It might be a good idea to work up a testing scaffold (or just tell people doing high-throughput stuff to install cchardet).

I suspect not much of a perf improvement: there's a lot of time spent looking at each byte in Python code, which is always going to be slow. The way to get it to be faster is to just say: "try to decode the header block with UTF-8; if that fails, try with latin-1". That's the barest minimum of better, but it might be good enough.

I have the same problem. Code works on Python 2.7 but was failed with Python 3.6.4

import requests
import logging

from requests.auth import HTTPBasicAuth

logging.basicConfig(level=logging.DEBUG)

r = requests.get('http://test.local',
                 auth=HTTPBasicAuth('USER', 'PASSWORD'))

print(r)

DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): test-silhouette.3shape.local
DEBUG:urllib3.connectionpool:http://test.local:80 "GET / HTTP/1.1" 200 None
WARNING:urllib3.connectionpool:Failed to parse headers (url=http://test.local:80/): [MissingHeaderBodySeparatorDefect()], 
unparsed data: 'Access-Control-Expose-Headers : WWW-Authenticate\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Methods: 
GET, POST, OPTIONS, PUT, PATCH, DELETE\r\nAccess-Control-Allow-Credentials: true\r\nAccess-Control-Allow-Headers: accept, authorization, 
Content-Type\r\nDate: Thu, 12 Apr 2018 14:06:25 GMT\r\nContent-Length: 2692\r\n\r\n'
Traceback (most recent call last):
  File "D:\Python3.6.4\lib\site-packages\urllib3\connectionpool.py", line 399, in _make_request
    assert_header_parsing(httplib_response.msg)
  File "D:\Python3.6.4\lib\site-packages\urllib3\util\response.py", line 66, in assert_header_parsing
    raise HeaderParsingError(defects=defects, unparsed_data=unparsed_data)
urllib3.exceptions.HeaderParsingError: [MissingHeaderBodySeparatorDefect()], unparsed data: 'Access-Control-Expose-Headers : 
WWW-Authenticate\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Methods: GET, POST, OPTIONS, PUT, PATCH, 
DELETE\r\nAccess-Control-Allow-Credentials: true\r\nAccess-Control-Allow-Headers: accept, authorization, Content-Type\r\nDate: Thu, 
12 Apr 2018 14:06:25 GMT\r\nContent-Length: 2692\r\n\r\n'
Traceback (most recent call last):
  File "D:\Python3.6.4\lib\site-packages\urllib3\response.py", line 302, in _error_catcher
    yield
  File "D:\Python3.6.4\lib\site-packages\urllib3\response.py", line 384, in read
    data = self._fp.read(amt)
  File "D:\Python3.6.4\lib\http\client.py", line 449, in read
    n = self.readinto(b)
  File "D:\Python3.6.4\lib\http\client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "D:\Python3.6.4\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Hi @safo-bora, the issues you鈥檙e encountering is because the server is returning an illegal header.

The first header:
'Access-Control-Expose-Headers : WWW-Authenticate\r\n

There is a space between the header name and the colon which violates RFC 7230 section 3.2.4:

No whitespace is allowed between the header field-name and colon. In
the past, differences in the handling of such whitespace have led to
security vulnerabilities in request routing and response handling. A
server MUST reject any received request message that contains
whitespace between a header field-name and colon with a response code
of 400 (Bad Request).

Thank you!

We have found the problem on our side.

Now works!

image

For folks who need to alter how requests ( and urllib3 as a whole ) parse headers, if changing the response isn't an option 鈥斅爃ere's a hint; hope it might help:

https://gist.github.com/bahoo/bdfedfc47cb971840cae489a844a2408

Was this page helpful?
0 / 5 - 0 ratings

Related issues

remram44 picture remram44  路  4Comments

8key picture 8key  路  3Comments

eromoe picture eromoe  路  3Comments

cnicodeme picture cnicodeme  路  3Comments

xsren picture xsren  路  3Comments