Requests: gzip response is not decoded

Created on 13 Feb 2015 · 7Comments · Source: psf/requests

It says in the FAQ:

Requests automatically decompresses gzip-encoded responses, and does its best to decode response content to unicode when possible.

But the following code produces uncompressed data:

import requests
response = requests.get("http://www.shopcade.com/sitemaps/sitemap_products_index.xml.gz")
print response.text

So is this an issue of requests library or is there something wrong with the response from that site. When I download the archive via browser it decompresses well and the xml seems to be fine. Also the response headers contain the following key-value pair:

'content-type': 'application/x-gzip'

Source

barab-a

Most helpful comment

Alternatively, you can set up a fairly simple generator-based pipeline:

import zlib

def decompress_stream(stream):
    o = zlib.decompressobj(16 + zlib.MAX_WBITS)

    for chunk in stream:
        yield o.decompress(chunk)

    yield o.flush()


r = requests.get(some_url, stream=True)
parseable_data = decompress_stream(r.iter_content(1024))

Lukasa on 15 Oct 2015

👍4

All 7 comments

Ah, yes. This is an easy misunderstanding to make.

When we say gzip-encoded responses, we mean responses that are sent with Transfer-Encoding: gzip. That means that the body has a known type but has been compressed with gzip for transport. That does not include things whose actual content is gzip, as in the example above. Those are transmitted exactly as originally served.

Lukasa on 13 Feb 2015

Seems like a good thing to properly document for users so this confusion doesn't happen again.

sigmavirus24 on 14 Feb 2015

👍2

@Lukasa:
I'm wondering if it's possible to use Requests to decode a gzip-encoded file (e.g. not-for-transport), or would that not be in the spirit of the library (file provenance and all)?

I've got a large number of gzipped text file to download then parse (that would be amazing to just stream), and if it was the server doing the encoding, it seems like Requests could handle it no problem, but if the file's already in that form, would it work?

riordan on 14 Oct 2015

@riordan we will not decode something that does not have a gzip (or compress) Content-Encoding. The gzip module in Python should do this for you though and it might be possible to create that with response.raw as a file object to sort of stream i.

sigmavirus24 on 14 Oct 2015

Alternatively, you can set up a fairly simple generator-based pipeline:

import zlib

def decompress_stream(stream):
    o = zlib.decompressobj(16 + zlib.MAX_WBITS)

    for chunk in stream:
        yield o.decompress(chunk)

    yield o.flush()


r = requests.get(some_url, stream=True)
parseable_data = decompress_stream(r.iter_content(1024))

Lukasa on 15 Oct 2015

👍4

@sigmavirus24 @Lukasa This is _awesome_ thank you. That's exactly what I wound up doing.

riordan on 16 Oct 2015

my code:

def decompress_stream(stream):
    o = zlib.decompressobj(16 + zlib.MAX_WBITS)
    for chunk in stream:
        yield o.decompress(chunk)
    yield o.flush()

rsp = requests.post(url,stream=bUseStream, data=szCmd, verify=False, headers=loadCookie(url,headers),timeout=None)

    rsp.raise_for_status()
    f9 = open(szFNms1,"wb")
    for chunk in decompress_stream(rsp.iter_content(chunk_size=81920)): 
        if chunk: # filter out keep-alive new chunks
            f9.write(chunk)
            f9.flush()
            print("+",sep="",end="",flush=True)
    f9.close()

is error

('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

@Lukasa

hktalent on 9 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Specify password for SSL client side certificate

botondus · 121Comments

"OverflowError: string longer than 2147483647 bytes" when trying requests.put

EB123 · 33Comments

TLS SNI Support

pythonmobile · 46Comments

requests has poor performance streaming large binary responses

alex · 40Comments

Why default to simplejson?

digitaldavenyc · 39Comments