Couple of issues here: we're not raising an exception after receiving a ProtocolError from HTTP/2 when connecting to yahoo.com. We should be bubbling that exception up to users.
The second issue is is we don't know why yahoo is giving us this error, we have to figure that out as I'm sure it'll impact more than just them.
I found this by running interop tests against Alexa 1000 websites.
I found this by running interop tests against Alexa 1000 websites.
Good call - were you just making a standard GET request to every site ranked in there, or something more complex than that?
I found plenty of problems with just getting the index. I've got to sort through them all cuz the Yahoo one kept hanging.
I found plenty of problems with just getting the index.
Okeydokes. At least we've got something to work from eh. 馃槂
Something that'd be really interesting would be adding our proposed http version control switch, and runnning compat tests only against HTTP/1.1.
I've been looking into this issue with the help of #240 and have been having trouble understanding our h2 flow.
Here's the logging for a request to https://yahoo.com
[DEBUG 2019-08-19 16:52:46,220] (httpx.dispatch.http2) H2 Connection initiating
[DEBUG 2019-08-19 16:52:46,221] (httpx.dispatch.http2) Stream [1] - Sending headers: b'\x00\x00/\x01\x04\x00\x00\x00\x01\x82A\x87\xf49\xceu\xc8z\x7f\x87\x84z\x8d\xaf\xd2g=KN\x94\xd7\xe5\x80.\xea\xe1S\x83\xf9c\xe7P\x8d\x9b\xd9\xab\xfaRB\
xcb@\xd2_\xa5#\xb3'
[DEBUG 2019-08-19 16:52:46,222] (httpx.dispatch.http2) Stream [1] - Ending request stream with data: b'\x00\x00\x00\x00\x01\x00\x00\x00\x01'
[DEBUG 2019-08-19 16:52:46,223] (httpx.dispatch.http2) Stream [1] - Received response events: b'\x00\x00\x0c\x04\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00d\x00\x04\x00\x10\x00\x00\x00\x00\x04\x08\x00\x00\x00\x00\x00'
[DEBUG 2019-08-19 16:52:46,223] (httpx.dispatch.http2) events [<RemoteSettingsChanged changed_settings:{ChangedSetting(setting=SettingCodes.MAX_CONCURRENT_STREAMS, original_value=None, new_value=100), ChangedSetting(setting=SettingCodes
.INITIAL_WINDOW_SIZE, original_value=65535, new_value=1048576)}>]
[DEBUG 2019-08-19 16:52:46,234] (httpx.dispatch.http2) Stream [1] - Received response events: b'\x00\x0f\x00\x01'
[DEBUG 2019-08-19 16:52:46,235] (httpx.dispatch.http2) events [<WindowUpdated stream_id:0, delta:983041>]
[DEBUG 2019-08-19 16:52:46,314] (httpx.dispatch.http2) Stream [1] - Received response events: b'\x00\x00\x08\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'
[DEBUG 2019-08-19 16:52:46,314] (httpx.dispatch.http2) events [<ConnectionTerminated error_code:ErrorCodes.PROTOCOL_ERROR, last_stream_id:0, additional_data:None>]
[DEBUG 2019-08-19 16:52:46,326] (httpx.dispatch.http2) Stream [1] - Received response events: b''
[DEBUG 2019-08-19 16:52:46,327] (httpx.dispatch.http2) events []
... infinite loop of these two log entries ...
It seems yahoo is sending a RemoteSettingsChanged event which do not have a stream_id attribute so we ignore it. We then receive a WindowUpdated on Stream 0, which does have an attribute stream_id but since it's falsy we do not add it to the self.events dictionary.
Are we intentionally ignoring stream 0?
I don't know a lot about HTTP2 but seems like yahoo is expecting these two streams to be handled but our flow seems to only follow a single stream, is that correct?
We need to process the events coming in from stream 0 in addition to the stream we're waiting for.
The RemoteSettingsChanged could matter when we start having to worry about too many concurrent streams, I'm sure there's other things we need there. (For example if a website asked us to only have 1 concurrent stream and we opened another that'd be a violation)
The WindowUpdated stream isn't a problem that we ignore it since the information is already signalled to h2's internal count of how many send-credits we have.
Probably good to look through all events that h2 is capable of emitting and making sure we are processing the ones we need.
D'ya have a clean, parsed list of URLs that we can use for running interop tests?
Wouldn't mind having a crack with Client(http_versions="HTTP/1.1") and seeing where we're at right now.
Certainly! Here's the Alexa Top 1M: alexa-1m.txt.gz
I added some more logging and ran tests against the alexa dataset and it looks like yahoo is doing things a bit differently.
Specifically I added some logging to the raw events returned by h2. It seems all other hosts send WindowUpdate on stream 0 (which according to the RFC indicates a change in the control flow at the connection level) along with RemoteSettingsChanged:
[DEBUG 11:10:06] (httpx.dispatch.http2) << Read events: [<RemoteSettingsChanged changed_settings:{ChangedSetting(setting=SettingCodes.MAX_CONCURRENT_STREAMS, original_value=None, new_value=128), ChangedSetting(setting=SettingCodes.INITIAL_WINDOW_SIZE, original_value=65535, new_value=65536), ChangedSetting(setting=SettingCodes.MAX_FRAME_SIZE, original_value=16384, new_value=16777215)}>, <WindowUpdated stream_id:0, delta:2147418112>]
We don't actually handle this event (though h2 does provide an API to increment the connection flow but I'm not sure if it's a requirement) but all hosts are fine with this and return SettingsAcknowledged after we send data from h2.
[DEBUG 11:10:06] (httpx.dispatch.http2) << Read events: [<SettingsAcknowledged changed_settings:{}>]
However, yahoo does not send WindowUpdated on stream 0 along with RemoteSettingsChanged, it sends it on its own after we send data from h2:
[DEBUG 10:57:15] (httpx.dispatch.http2) << Read events: [<RemoteSettingsChanged changed_settings:{ChangedSetting(setting=SettingCodes.MAX_CONCURRENT_STREAMS, original_value=None, new_value=100), ChangedSetting(setting=SettingCodes.INIT$
AL_WINDOW_SIZE, original_value=65535, new_value=1048576)}>]
[DEBUG 10:57:15] (httpx.dispatch.http2) Sending H2 data: b'\x00\x00\x00\x04\x01\x00\x00\x00\x00'
[DEBUG 10:57:15] (httpx.dispatch.http2) << Read events: [<WindowUpdated stream_id:0, delta:983041>]
[DEBUG 10:57:15] (httpx.dispatch.http2) Sending H2 data: b''
[DEBUG 10:57:15] (httpx.dispatch.http2) << Read events: [<ConnectionTerminated error_code:ErrorCodes.PROTOCOL_ERROR, last_stream_id:0, additional_data:None>]
I don't know if that's a violation of the protocol on yahoo's side or if we're supposed to be handling this lone WindowUpdated event differently somehow.
@nsidnev, you've been dealing with WindowUpdated events on streams, any ideas?
So I discovered the issue!
We're setting 0x8 (SETTINGS_ENABLE_CONNECT_PROTOCOL) which isn't defined in the original standard for HTTP/2 so Yahoo is breaking the RFC for HTTP/2:
[RFC 7540 Section 6.5.2] An endpoint that receives a SETTINGS frame with any unknown or unsupported identifier MUST ignore that setting.
If you change the initial settings frame to not send that setting Yahoo doesn't send a GOAWAY with PROTOCOL_ERROR but still doesn't respond to us. To get a response we need to not send an empty DATA frame with end_stream=True, we need to have the end_stream=True on the HEADERS frame. :/ Just seems like Yahoo's implementation of HTTP/2 isn't great but they're also in the Alexa top 20 so we'll probably be running into this a lot more?
Wow, thanks for the deep dive @sethmlarson, I don't think I would've figured that one out 馃槃
I agree this is likely to happen more and can become a source of bug reports. What should be our policy for non-RFC compliant implementations? Sounds like it would be a nightmare including all the subtly different ways different hosts might not comply with the RFC.
Also should we fix setting SETTINGS_ENABLE_CONNECT_PROTOCOL to try and be stick to the RFC as close as possible our our side?
I agree this is likely to happen more and can become a source of bug reports. What should be our policy for non-RFC compliant implementations? Sounds like it would be a nightmare including all the subtly different ways different hosts might not comply with the RFC.
Our broad policy should be to look to what web browser implementations do, and then follow their lead. That'd probably be a bit onerous to fully dig into all the time (eg. it'd mean grokking through the browers own issue trackers and project history, or else digging in with wireshark)
It's possible that looking towards curl would be a good option too - presumably their HTTP/2 implementation will have covered a quite a lot more of the tracks than we've done so far, and it's likely a lot more accessible poject to compare and contrast against.
In any case, if there's a conflict then we'll likely need to tend towards prioritizing for real world behaviors over RFC compliance.
If you change the initial settings frame to not send that setting
What usage does SETTINGS_ENABLE_CONNECT_PROTOCOL provide for? Do we need it to be set?
I see it included here. Is it just for WebSockets as per https://tools.ietf.org/html/rfc8441 ?
To get a response we need to not send an empty DATA frame with end_stream=True, we need to have the end_stream=True on the HEADERS frame.
That sounds like an okay change.