go version)?go version go1.7.4 linux/amd64
go env)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build044368586=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
https://play.golang.org/p/ArzIv0uBlQ
I see binary pre-decompression crap.
net/http doc says "If the Transport requests gzip on its own and gets a gzipped response, it's transparently decoded in the Response.Body. However, if the user explicitly requested gzip it is not automatically uncompressed".
I'm not requesting it, so it's requested on its own.
I expect it to be decompressed, since I didn't request compression.
I don't know if this is because blog.twitter.com does something out of spec.
Go is telling Twitter "Accept-Encoding: gzip", and Twitter is replying "Content-Encoding: deflate".
Seems like a bug on their side.
2017/01/24 21:59:23 http2: Transport failed to get client conn for blog.twitter.com:443: http2: no cached connection was available
2017/01/24 21:59:23 http2: Transport creating client conn 0xc4200016c0 to 199.59.150.42:443
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote SETTINGS len=18, settings: ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=4194304, MAX_HEADER_LIST_SIZE=10485760
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote WINDOW_UPDATE len=4 (conn) incr=1073741824
2017/01/24 21:59:23 http2: Transport encoding header ":authority" = "blog.twitter.com"
2017/01/24 21:59:23 http2: Transport encoding header ":method" = "GET"
2017/01/24 21:59:23 http2: Transport encoding header ":path" = "/2017/the-infrastructure-behind-twitter-scale"
2017/01/24 21:59:23 http2: Transport encoding header ":scheme" = "https"
2017/01/24 21:59:23 http2: Transport encoding header "accept-encoding" = "gzip"
2017/01/24 21:59:23 http2: Transport encoding header "user-agent" = "Go-http-client/2.0"
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote HEADERS flags=END_STREAM|END_HEADERS stream=1 len=69
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: read SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=65536
2017/01/24 21:59:23 http2: Transport received SETTINGS len=6, settings: INITIAL_WINDOW_SIZE=65536
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: wrote SETTINGS flags=ACK len=0
2017/01/24 21:59:23 http2: Framer 0xc4201181a0: read SETTINGS flags=ACK len=0
2017/01/24 21:59:23 http2: Transport received SETTINGS flags=ACK len=0
2017/01/24 21:59:24 http2: Framer 0xc4201181a0: read HEADERS flags=END_HEADERS stream=1 len=1030
2017/01/24 21:59:24 http2: decoded hpack field header field ":status" = "200"
2017/01/24 21:59:24 http2: decoded hpack field header field "age" = "0"
2017/01/24 21:59:24 http2: decoded hpack field header field "cache-control" = "public, max-age=60"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-encoding" = "deflate"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-language" = "en"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-security-policy" = "default-src https: data:; report-uri https://twitter.com/i/csp_report?a=M5QXUZLCN4%3D%3D%3D%3D%3D%3D&ro=false; img-src https: data: ; script-src https://*.twitter.com https://*.twimg.com https://*.vine.co https://ssl.google-analytics.com https://bat.bing.com 'unsafe-eval' ; font-src https: data: ; frame-src https://* chrome-extension: about: javascript: ; connect-src https: ; media-src https: ; object-src https: ; style-src https:"
2017/01/24 21:59:24 http2: decoded hpack field header field "content-type" = "text/html; charset=utf-8"
2017/01/24 21:59:24 http2: decoded hpack field header field "date" = "Tue, 24 Jan 2017 21:59:24 GMT"
2017/01/24 21:59:24 http2: decoded hpack field header field "expires" = "Tue, 24 Jan 2017 22:00:23 +0000"
2017/01/24 21:59:24 http2: decoded hpack field header field "last-modified" = "Tue, 24 Jan 2017 21:59:23 GMT"
2017/01/24 21:59:24 http2: decoded hpack field header field "link" = "</node/8676>; rel=\"shortlink\",<https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale>; rel=\"canonical\",<https://blog.twitter.com/sites/all/themes/gazebo/img/twitter-bird-white-on-blue.png>; rel=\"image_src\""
2017/01/24 21:59:24 http2: decoded hpack field header field "server" = "tsa_a"
2017/01/24 21:59:24 http2: decoded hpack field header field "set-cookie" = "guest_id=v1%3A148529516384605422; Domain=.twitter.com; Path=/; Expires=Thu, 24-Jan-2019 21:59:24 UTC"
2017/01/24 21:59:24 http2: decoded hpack field header field "strict-transport-security" = "max-age=631138519"
2017/01/24 21:59:24 http2: decoded hpack field header field "vary" = "Cookie"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-connection-hash" = "1d0990af0b5cbb39d969c1d7c4a5c7b2"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-content-type-options" = "nosniff"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-drupal-cache" = "MISS"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-frame-options" = "sameorigin"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-gazebo-app-rev" = "v414"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-gazebo-git-rev" = "097c9a635c93adea404a02bc2abd3578ab76d43d"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-gazebo-host" = "s14"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-response-time" = "1067"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-ua-compatible" = "IE=edge,chrome=1"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-varnish" = "1759374077"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-varnish-cache" = "MISS"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-varnish-l-curl" = "0"
2017/01/24 21:59:24 http2: decoded hpack field header field "x-xss-protection" = "1; mode=block"
2017/01/24 21:59:24 http2: Transport received HEADERS flags=END_HEADERS stream=1 len=1030
2017/01/24 21:59:24 http2: Framer 0xc4201181a0: read DATA stream=1 len=1832 data="x\x9c\xec\x18\xdbR\xe38\xf6\x9d\xaf泻\xab姗卻#\x17\x9aP\x9b\x04\x1a\x02M\b\x04:\x84\xed\xa9\x94l+\xb6\x88,\x19K\xb9\x98侃\xda\xdf\xd8\xdf\xdb/\xd9#9!\xa1\x87\xde\xed冖{_\xe6%9>夜\xe8\xe8\\\xb5s\xf0\x97\xa3\xcb\xce通\u007f\x8c\"\x1d\xf3脻\x03\xf3\x87\x02\x966\x1d\xaeSg\a\xa1e虆跅a\xf3\x97菣\xd4\x1f\"\xad\x93}讜a\xb2\x1bSW\xa8w9\x1aq\"娄C\x85\x83|N\x94j:\xce\xe1\x0eP\x1fD\x94\x04\x87\x00\x00\x18SM\x90\xa1\xc7\xf4q\xc6\xe6M\xa7#\x85\xa6B\xe3\x9b,\xa1@\x97\u007f5\x1dM\x97\xda5z|@~DREus\xa6'\xb8\xee \x17\x14\xb4\\\x04\x89)l\\0\xadi\xba?K\xf9\x16\xb9\x91\xa0@E\x8f\xcbpw\xb5e讞\xb1[*\x14k\xae\x8e(fb\x92\x12\xa5訖\xafg)\xc5\x1e\x8d\x98\b\xf0j+V>\xe1\xf4\x9b\xb2\x14" (1576 bytes omitted)
See http://stackoverflow.com/questions/388595/why-use-deflate-instead-of-gzip-for-text-files-served-by-apache for why gzip is preferred over deflate.
Not sure there's anything for us to do here. I suppose we could attempt to un-deflate things.
/cc @tombergan @dsnet
This is definitely a bug on twitter's end; we asked for "gzip", and they gave us "deflate".
I don't think we should transparently decompress things with the "deflate" TE since there is some confusion about what the actual format of "deflate" is. RFC 2616 specifies that "deflate" should actually be the zlib format (RFC 1950), but there are some implementations that accidentally treat it as raw DEFLATE (RFC 1951). If we tried to transparently decompress, we could run into decompression errors because of the wrong format. Worse yet, neither raw DEFLATE or zlib have any magic values, so you cannot differentiate it from the zlib format. The internet ended up avoiding "deflate" as a TE and moved to "gzip" to avoid this confusion.
I wouldn't want the Go implementation to get mixed up in all that confusion.
Copying https://github.com/twitter/netty-http2 folk who can maybe help: @yschimke, @atollena
Twitter appears to be returning "deflate"-compressed responses when a client only asks for "gzip".
This is definitely a bug on twitter's end; we asked for "gzip", and they gave us "deflate".
If we want to language-lawyer RFC 7231, I believe that servers are technically not required to obey Accept-Encoding; that header is just a recommendation. For example, note the use of SHOULD instead of MUST:
https://tools.ietf.org/html/rfc7231#section-5.3.4
If an Accept-Encoding header field is present in a request and none of the available representations for the response have a content-coding that is listed as acceptable, the origin server SHOULD send a response without any content-coding.
That said, I would still consider this a bug in Twitter's server ... any other interpretation invites insanity.
Some clients will decode any response with a known Content-Encoding, independent of whether or not the encoding was explicitly allowed via the request's Accept-Encoding. One such client is Chrome. It would not be entirely unreasonable for a Go program to do the same, however, I probably would not bake that behavior into the standard library.
I don't think we should transparently decompress things with the "deflate" TE since there is some confusion about what the actual format of "deflate" is. RFC 2616 specifies that "deflate" should actually be the zlib format (RFC 1950), but there are some implementations that accidentally treat it as raw DEFLATE (RFC 1951).
I think you're saying that the meaning of "Content-Encoding: deflate" is ambiguous, due to the existence of some broken implementations? This problem seems orthogonal to the Accept-Encoding issue, and in any case, I'd be inclined to consider an implementation broken if it does not follow RFC 2616. The updated text from RFC 7230/7234 agrees with the old text from RFC 2616:
https://tools.ietf.org/html/rfc7231#section-8.4
https://tools.ietf.org/html/rfc7230#section-4.2
@bradfitz I doubt Yuri or Antoine are listening in to this nowadays. @mosesn
Cannot reproduce:
$ curl -s -D - -H 'Accept-Encoding: gzip' https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale -o /dev/null
HTTP/1.1 200 OK
age: 0
cache-control: public, max-age=60
content-encoding: gzip
content-language: en
content-security-policy: default-src https: data:; report-uri https://twitter.com/i/csp_report?a=M5QXUZLCN4%3D%3D%3D%3D%3D%3D&ro=false; img-src https: data: ; script-src https://*.twitter.com https://*.twimg.com https://*.vine.co https://ssl.google-analytics.com https://bat.bing.com 'unsafe-eval' ; font-src https: data: ; frame-src https://* chrome-extension: about: javascript: ; connect-src https: ; media-src https: ; object-src https: ; style-src https:
content-type: text/html; charset=utf-8
date: Fri, 27 Jan 2017 07:43:28 GMT
expires: Fri, 27 Jan 2017 07:44:27 +0000
last-modified: Fri, 27 Jan 2017 07:43:27 GMT
link: </node/8676>; rel="shortlink",<https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale>; rel="canonical",<https://blog.twitter.com/sites/all/themes/gazebo/img/twitter-bird-white-on-blue.png>; rel="image_src"
server: tsa_a
set-cookie: guest_id=v1%3A148550300722182200; Domain=.twitter.com; Path=/; Expires=Sun, 27-Jan-2019 07:43:28 UTC
strict-transport-security: max-age=631138519
transfer-encoding: chunked
vary: Cookie
x-connection-hash: 49936bf73c44abf5429426fc8cbc66bd
x-content-type-options: nosniff
x-drupal-cache: MISS
x-frame-options: sameorigin
x-gazebo-app-rev: v414
x-gazebo-git-rev: 097c9a635c93adea404a02bc2abd3578ab76d43d
x-gazebo-host: s4
x-response-time: 1072
x-ua-compatible: IE=edge,chrome=1
x-varnish: 2125593008
x-varnish-cache: MISS
x-varnish-l-curl: 0
x-xss-protection: 1; mode=block
Ah, I wrote too soon. Let me try with http/2. Getting a curl which can do http/2, will report back soon.
As far as I can tell, on http/2, twitter.com always gzips, and blog.twitter.com always deflates. Seems odd to me, I'll poke around tomorrow.
I worked around this twitter bug 6 months ago. Apologies, I should have actually worked with you guys to fix it.
@mosesn, thanks!
This is still happening but it still seems like it must be Twitter's fault, unless HTTP/2 requires all clients to support deflate decoding. Moving to Go 1.11.
@rsc see my comment above, Twitter may fix it, but it is a known twitter bug you will probably need to workaround.
@rsc @bradfitz this is in flight, targeted for early January. sorry for the long delay, it fell off my plate for a bit. https://twittercommunity.com/t/improving-the-twitter-api-support-for-http-2/98728
@rsc @bradfitz this has been fixed, we now handle accept-encoding correctly.
@mosesn, thanks for the fix and update! Will close.
Most helpful comment
@rsc @bradfitz this has been fixed, we now handle accept-encoding correctly.