Runtime: HttpClient doesn't decompress "deflate" correctly

Created on 17 Jun 2020  路  30Comments  路  Source: dotnet/runtime

Description

I am using .NET Core v3.1 to fetch data from a REST API that uses deflate compression. I use the following code:

var httpClientHandler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.All };
using var httpClient = new HttpClient(httpClientHandler);
var request = new HttpRequestMessage(HttpMethod.Get, "https://api.example.com/api/v1/get");
var response = await httpClient.SendAsync(request);

This request sends out the following headers:

GET https://api.example.com/api/v1/get HTTP/1.1
Host: api.example.com
Accept-Encoding: gzip, deflate, br

The server responds with (parsed with Fiddler):

HTTP/1.1 200 OK
Server: openresty/1.15.8.1
Date: Wed, 17 Jun 2020 12:06:27 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
charset: utf-8
Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With
Content-Encoding: deflate
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15724800; includeSubDomains

My .NET Core client cannot decompress the data and fails with "The archive entry was compressed using an unsupported compression method.". When I invoke this command using Curl or Postman, then it works fine and the result is a valid JSON result.

The binary data looks like this:

00000000: 789c edd4 414b c330 1407 f0af 22ef 9c43  x...AK.0...."..C
00000010: 92a6 8deb 7517 3d74 8808 1ec6 0e61 895d  ....u.=t.....a.]
00000020: a14d 244b 6132 f6dd 4daa 42ab 87a5 13d7  .M$Ka2..M.B.....
00000030: 83cd a92f fcf3 dafc 0a6f 7d84 da08 b932  .../.....o}....2
00000040: 9013 042f 4ac9 4a97 9f95 15ae 32fa 5e42  .../J.J.....2.^B
00000050: ce09 43d0 5407 6543 45b2 0481 8f59 1f56  ..C.T.eCE....Y.V
00000060: daed 215f 1f7b 7588 30cc 1608 9cb0 a572  ..!_.{u.0......r
00000070: cfaa 2a77 0ef2 c49f 92f6 ad10 ce29 fba0  ..*w.........)..
00000080: ec16 728c 406c 5d2b ea22 f45e 9aa6 11da  ..r.@l]+.".^....
00000090: 9f86 9bb0 309c 504c df8c 47f7 2db0 5f14  ....0.PL..G.-._.
000000a0: 4e1b 04a5 35ed ebc7 a777 8fdd c528 c7df  N...5....w...(..
000000b0: bb13 ecb7 764a c8a5 6975 5787 d3ce 3851  ....vJ..iuW...8Q
000000c0: 3ffd 0c76 fb77 fdb4 7713 87c7 4eb2 970c  ?..v.w..w...N...
000000d0: 37fb 72a7 b3fb 24ee c9ec 3e89 3b9b dd29  7.r...$...>.;..)
000000e0: 27e7 dcd3 31ec e979 f5f4 8fd5 e371 46a1  '...1..y.....qF.
000000f0: 93f8 9f79 013a 8d33 a797 9167 0372 3a20  ...y.:.3...g.r:
00000100: 67f8 f7e4 fe75 78b4 fab8 5110 3f0b e287  g....ux...Q.?...
00000110: 413a 40e2 0324 f63f a7c1 0453 f876 769f  A:@..$.?...S.vv.
00000120: c47d 31bb 5fcf 7df3 0ea6 e930 f2         .}1._.}....0.

After some digging, I found that the first two bytes 78 9C are the ZLIB header and the final four bytes A6 E9 30 F2 are the Adler32 checksum that is part of ZLIB compressed data. It seems that .NET uses the standard DeflateStream when it encounters deflated content and this stream cannot deal with deflated data that has this ZLIB header.

RFC2616 section 3.5 specifies:

deflate The "zlib" format defined in RFC 1950 in combination with the "deflate" compression mechanism described in RFC 1951.

.NET Core and .NET Framework don't implement the RFC 1950 part of this specification, so if the HTTP server uses the ZLIB envelope the data cannot be decompressed and the call fails.

Expected behaviour

The .NET Core implementation should detect that a ZLIB envelop is being used and decompress the data according to this header.

Workaround

Only specify the DecompressionMethods.GZip for the HttpClientHandler's AutomaticDecompression property. If the server behaves properly, then it should use GZip compression instead (or not use compression after all). If you have a bad behaving HTTP server it may still send deflated data instead.

area-System.Net.Http bug

Most helpful comment

in practice with common implementations, maybe it'd be possible to just strip it off

Maybe for .NET 5, since this has existed forever, we could handle what seems like the vast majority case just by stripping off the typical two-byte header and the checksum footer:
```C#
using System;
using System.IO;
using System.IO.Compression;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
static async Task Main()
{
var hc = new HttpClient();
byte[] buffer = await hc.GetByteArrayAsync("http://httpbin.org/deflate");
Console.WriteLine(new StreamReader(new DeflateStream(new MemoryStream(buffer, 2, buffer.Length - 6), CompressionMode.Decompress)).ReadToEnd());
}
}
```

It's not 100% robust, but then again, what we're currently doing is apparently 100% problematic :) We could validate the two-byte header is the exact 78 9C combination we know how to handle, and optionally validate the checksum.

(The "right" answer is still a ZlibStream that just lets zlib handle it all. Just exploring alternatives.)

All 30 comments

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

Issue #2236 already suggest that RFC 1950 should be implemented (ZLIB compatible streams). But it doesn't relate to not being able to connect to some HTTP servers.

@davidsh, @karelz, @scalablecory, any idea how this slipped through for so long? Seems like neither .NET Framework nor .NET Core properly handles default, e.g. this standalone repro fails:
```C#
using System;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
static async Task Main()
{
var hc = new HttpClient(new HttpClientHandler() { AutomaticDecompression = System.Net.DecompressionMethods.Deflate });
Console.WriteLine(await hc.GetStringAsync(@"https://httpbin.org/deflate"));
}
}
```

any idea how this slipped through for so long?

We haven't had any customer reports of this. How common is this type of "deflate"?

It seems that .NET uses the standard DeflateStream when it encounters deflated content and this stream cannot deal with deflated data that has this ZLIB header.

Perhaps this particular form of 'deflate' is less commonly used.

How common is this type of "deflate"?

As far as I can tell from https://tools.ietf.org/html/rfc2616 section 3.5, as far as content-coding is concerned, it's the only type of "deflate" :wink: The terminology is a little confusing, but the spec defines the "deflate" content coding as being the zlib file format around the deflate algorithm; DeflateStream provides the raw deflate algorithm, without the zlib header/footer.

DeflateStream provides the raw deflate algorithm, without the zlib header/footer.

We have tests in .NET Framework and .NET Core for deflate and gzip. But we only test against ourselves, i.e. the .NET DeflateStream classes. We don't have any 'interop' tests with other implementations. That is probably how we missed this case.

DeflateStream provides the raw deflate algorithm, without the zlib header/footer.

Shouldn't the fix to this issue go into the implementation of DeflateStream and GzipStream classes? I don't think this bug is specific to just HttpClient.

@davidsh According to the RFC 2616 sec 3.5, the ZLIB header MUST be used when returning data with Content-Encoding: deflate. This is also mentioned in this blog:

So, DEFLATE, and Content-Encoding: deflate, actually means the response body is composed of the zlib format (zlib header, deflated data, and a checksum).

Mark Adler mentions that early versions of IIS use raw deflate instead of zlib-deflate (link):

However early Microsoft servers would incorrectly deliver raw deflate for "Deflate" (i.e. just RFC 1951 data without the zlib RFC 1950 wrapper). This caused problems, browsers had to try it both ways, and in the end it was simply more reliable to only use gzip.

I guess it was tested with older IIS servers and deflate isn't used that much.

Shouldn't the fix to this issue go into the implementation of DeflateStream and GzipStream classes?

DeflateStream is the raw deflate algorithm, no header, no footer, and can be used when some other format is including "deflate" and providing a header/footer around it. GzipStream effectively does just that, wrapping that with the gzip header and footer. To fix this, we would very likely want to expose a ZlibStream (https://github.com/dotnet/runtime/issues/2236), and use that instead of DeflateStream in SocketsHttpHandler.

I don't think this bug is specific to just HttpClient.

It is: HttpClient is handling content-coding and deciding to use DeflateStream to handle "deflate", which isn't correct.

Shouldn't the fix to this issue go into the implementation of DeflateStream and GzipStream classes? I don't think this bug is specific to just HttpClient.

I don't think so, because DeflateStream does exactly what it should do. I think a ZLibStream should be created that mimics the ZLIB behaviour. Content-Encoding: deflate actually means Content-Encoding: zlib (and the zlib payload contains deflated bytes).

A ZLibStream is a bit more complicated, because it should be able to support different compression algorithms. The ZLIB header contains the actual algorithm that is used (see RFC 1950).

For Content-Encoding: deflate this should always be the deflate algorithm, but ZLIB is used in more places and can use different algorithms. It should also support the compression part. Issue #2236 has detailed information about a ZLibStream.

EDIT: See @stephentoub comment above (it's the same)

We do have some remote server tests for decompression here: https://github.com/dotnet/runtime/blob/master/src/libraries/Common/tests/System/Net/Http/HttpClientHandlerTest.Decompression.cs#L164

That said, I'm not sure what the remote server is doing to perform the compression here -- could be IIS itself, could be DeflateStream, etc. @davidsh do you know?

It does seem like this would be a good area to add more interop-style testing.

That said, I'm not sure what the remote server is doing to perform the compression here -- could be IIS itself, could be DeflateStream, etc. @davidsh do you know?

The remote servers in Azure use Azure App Service, ASP.NET Core. Our current configuration for that uses IIS and ASP.Net Core running in-proc (i. e. it's not Kestrel for example).

Remote servers used:

The remote servers in Azure use Azure App Service, ASP.NET Core. Our current configuration for that uses IIS and ASP.Net Core running in-proc (i. e. it's not Kestrel for example).

Yeah, I get that. What I don't know is whether the compression is done by IIS itself, or manually by the ASP.NET code using DeflateStream. If the former, then at least we are getting some basic interop testing here (against IIS). If the latter, then we are just testing against ourselves, same as we do for inproc tests.

Workaround
Only specify the DecompressionMethods.GZip for the HttpClientHandler's AutomaticDecompression property.

It seems like we should stop sending the deflate header until we actually have a working deflate implementation, i.e. ZLibStream + proper usage in HttpClient.

What I don't know is whether the compression is done by IIS itself, or manually by the ASP.NET code using DeflateStream.

Source code that handles compression for remote servers is here:
https://github.com/davidsh/corefx-net-endpoint/tree/master/NetCoreServer/Handlers

Source code that handles compression for remote servers is here

Thanks. So it's just us using DeflateStream:
https://github.com/davidsh/corefx-net-endpoint/blob/999f8356816be739ec65dae5a1a89782b3936a66/NetCoreServer/Helpers/ContentHelper.cs#L16

Thanks @davidsh. So we're using DeflateStream on the server too.

It would be nice to add more interop style testing here (and elsewhere), e.g. using httpbin or whatever.

It would be nice to add more interop style testing here (and elsewhere), e.g. using httpbin or whatever.

We should add more interop testing. But we should not start using more third-party servers. Our current plan is to finish removing all third-party servers from our Outerloop tests. Instead, we will create more first-party servers (i.e. run in our own Azure network) so that we can guarantee better uptime for those servers.

We can run our own httpbin server, so we don't need to rely on a third party server.

https://github.com/postmanlabs/httpbin

We can run our own httpbin server, so we don't need to rely on a third party server.

Great! Maybe there is a docker image for it perhaps running on Linux. That would give us great interop testing and be easy to deploy to Azure quickly.

Just to validate the right behavior, this hack fixes the previous repro:
https://github.com/stephentoub/runtime/commit/4d45d4408223998f8f8f64c8a4489e6019b22b7d

cc: @carlossanlop @ericstj

@stephentoub So does that mean the fix is pretty simple? We still need to add a new ZLibStream class or whatever, right? But it sounds like adding that is straightforward?

I do think we need to either
(a) fix this properly for 5.0 (which I think means ZLibStream + proper HttpClient usage), or
(b) disable sending deflate until we have a fix

So does that mean the fix is pretty simple?

From a coding perspective, yes. zlib already provides all the functionality that's needed here, and as DeflateStream just wraps zlib, it inherently already supports it as well (the magic in my hack is the "15" passed to the internal constructor, which is passed down as the "window" argument to inflateinit2 (https://www.zlib.net/manual.html) telling it to detect the zlib wrapper). It's just not exposed. So we'd need to do one of the following or something else of this ilk:
a) add a public ctor to DeflateStream exposing this window
b) add a ZlibStream that uses the correct window implicitly (just as GZipStream does)
c) build all of the relevant source into System.Net.Http so we can just call the internal ctor
d) use reflection to access the internal ctor

(b) is arguably the cleanest/best solution, and addresses a wider-array of issues as well. It's also the most design work. But maybe @carlossanlop or @ericstj could weigh in on whether we could quickly agree on the shape outlined in #2236 or whether we need to reconsider any aspects of it.

The rest of the options are varying degrees of yucky.

The rest of the options are varying degrees of yucky.

Given that this issue has apparently existed forever and we just discovered it, it doesn't seem like it justifies doing something yucky.

What is the header content? Does it warrant a brand new public type or could we just do it in an internal managed stream that implemented the header?

What is the header content? [...] could we just do it in an internal managed stream that implemented the header?

For compression, probably. But for decompression (which is the scenario here), I don't believe that's possible, at least not completely robustly (in practice with common implementations, maybe it'd be possible to just strip it off): as I understand the header (https://tools.ietf.org/html/rfc1950), it conveys necessary information to the decoder, like what window size was used, some data that may need to be fed into the decoder initially, etc.

in practice with common implementations, maybe it'd be possible to just strip it off

Maybe for .NET 5, since this has existed forever, we could handle what seems like the vast majority case just by stripping off the typical two-byte header and the checksum footer:
```C#
using System;
using System.IO;
using System.IO.Compression;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
static async Task Main()
{
var hc = new HttpClient();
byte[] buffer = await hc.GetByteArrayAsync("http://httpbin.org/deflate");
Console.WriteLine(new StreamReader(new DeflateStream(new MemoryStream(buffer, 2, buffer.Length - 6), CompressionMode.Decompress)).ReadToEnd());
}
}
```

It's not 100% robust, but then again, what we're currently doing is apparently 100% problematic :) We could validate the two-byte header is the exact 78 9C combination we know how to handle, and optionally validate the checksum.

(The "right" answer is still a ZlibStream that just lets zlib handle it all. Just exploring alternatives.)

Or we could just disable sending deflate. Since we've only just discovered this, it seems like the impact of turning it off would be very, very small.

I do think we should do the "right" thing eventually -- ZLibStream etc. Just pointing out that hacking around to make this work in the short term seems like overkill.

I'm ok with that, too.

Was this page helpful?
0 / 5 - 0 ratings