Requests: gzip response is not decoded

Created on 13 Feb 2015  路  7Comments  路  Source: psf/requests

It says in the FAQ:

Requests automatically decompresses gzip-encoded responses, and does its best to decode response content to unicode when possible.

But the following code produces uncompressed data:

import requests
response = requests.get("http://www.shopcade.com/sitemaps/sitemap_products_index.xml.gz")
print response.text

So is this an issue of requests library or is there something wrong with the response from that site. When I download the archive via browser it decompresses well and the xml seems to be fine. Also the response headers contain the following key-value pair:

'content-type': 'application/x-gzip'

Most helpful comment

Alternatively, you can set up a fairly simple generator-based pipeline:

import zlib

def decompress_stream(stream):
    o = zlib.decompressobj(16 + zlib.MAX_WBITS)

    for chunk in stream:
        yield o.decompress(chunk)

    yield o.flush()


r = requests.get(some_url, stream=True)
parseable_data = decompress_stream(r.iter_content(1024))

All 7 comments

Ah, yes. This is an easy misunderstanding to make.

When we say gzip-encoded responses, we mean responses that are sent with Transfer-Encoding: gzip. That means that the body has a known type but has been compressed with gzip for transport. That does not include things whose actual content is gzip, as in the example above. Those are transmitted exactly as originally served.

Seems like a good thing to properly document for users so this confusion doesn't happen again.

@Lukasa:
I'm wondering if it's possible to use Requests to decode a gzip-encoded file (e.g. not-for-transport), or would that not be in the spirit of the library (file provenance and all)?

I've got a large number of gzipped text file to download then parse (that would be amazing to just stream), and if it was the server doing the encoding, it seems like Requests could handle it no problem, but if the file's already in that form, would it work?

@riordan we will not decode something that does not have a gzip (or compress) Content-Encoding. The gzip module in Python should do this for you though and it might be possible to create that with response.raw as a file object to sort of stream i.

Alternatively, you can set up a fairly simple generator-based pipeline:

import zlib

def decompress_stream(stream):
    o = zlib.decompressobj(16 + zlib.MAX_WBITS)

    for chunk in stream:
        yield o.decompress(chunk)

    yield o.flush()


r = requests.get(some_url, stream=True)
parseable_data = decompress_stream(r.iter_content(1024))

@sigmavirus24 @Lukasa This is _awesome_ thank you. That's exactly what I wound up doing.

my code:

def decompress_stream(stream):
    o = zlib.decompressobj(16 + zlib.MAX_WBITS)
    for chunk in stream:
        yield o.decompress(chunk)
    yield o.flush()

rsp = requests.post(url,stream=bUseStream, data=szCmd, verify=False, headers=loadCookie(url,headers),timeout=None)

    rsp.raise_for_status()
    f9 = open(szFNms1,"wb")
    for chunk in decompress_stream(rsp.iter_content(chunk_size=81920)): 
        if chunk: # filter out keep-alive new chunks
            f9.write(chunk)
            f9.flush()
            print("+",sep="",end="",flush=True)
    f9.close()

is error

('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

@Lukasa

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JimHokanson picture JimHokanson  路  3Comments

NoahCardoza picture NoahCardoza  路  4Comments

cnicodeme picture cnicodeme  路  3Comments

brainwane picture brainwane  路  3Comments

jake491 picture jake491  路  3Comments