Crystal: HTTP::Headers gets downcase

Created on 8 Aug 2019  路  16Comments  路  Source: crystal-lang/crystal

This shows it perfectly:

require "http"

headers = <<-EOF
Content-Length: 0
User-Agent: Me!12
\n
EOF

io_mem = IO::Memory.new
HTTP.parse_headers_and_body(IO::Memory.new(headers)) do |headers, body|
  HTTP.serialize_headers(io_mem, headers)
  pp body
end

io_mem.to_s # => "content-length: 0\r\nuser-agent: Me!12\r\n\r\n"

The expected result is to get back the same "case" for the headers, but instead it gets downcased

bug topicnetworking

Most helpful comment

Yeah for HTTP/2. There is no such rule for older protocols. But I guess we essentially want to use the same data structures for any HTTP protocol, so it's probably better to normalize downcased right away.

All 16 comments

This is due to recent optimizations in #8002. It only affects common header names which are cached downcased for easier comparability (see https://github.com/crystal-lang/crystal/pull/8002/files#diff-520d933e69a34d18025788f47787c6f2R173).
If you use an uncommon header name (for example Foo: 0), it gets copied verbatim.

Maybe we should consider a different solution to caching common names (for example StringPool as suggested in https://github.com/crystal-lang/crystal/pull/8002#discussion_r308346901).

Because header names are case-insensitive this is not a very serious issue, though. But it can still be annoying.

We are doing a lot of HTTP::Headers manipulation, and we digest and parse them raw.
This introduced a need to manually upcase to verify using specs.

This introduced a need to manually upcase to verify using specs

But upcase shouldn't work here because you'd get CONTENT-TYPE instead of Content-Type?

In any case, I'll "fix" this by making it always return "Content-Type", "User-Agent", etc. Headers are case insensitive and that's a good default, I think.

@asterite I meant Capitalize :) my mistake.
we do something along the lines of .split("-").map {|h| h.capitalize}

Headers in HTTP/2 are now always transmitted lowercase (followed in HTTP/3) and it's becoming common to now receive headers lowercased, even in HTTP/1 and it's clunky case-insensitiveness (with an HTTP/2 reverse proxy for example, which is now default in Google GKE for example).

I strongly advise to join the bandwagon. HTTP headers are now lowercase by default, and that's good.

@asterite there is nothing to fix, in fact all headers should always be lowercased.

@bararchy That doesn't work for ETag for example.

@ysbaddaden

there is nothing to fix, in fact all headers should always be lowercased.

Who says that?

Since case doesn't matter, IMHO the best behaviour is to not alter any names.

I could accept a general doctrine to consider all header names downcased though, if that's a future proof path. As long as we consequently downcase them everywhere (Headers.add for example), this is probably okay (sometimes it might be surprising though).

I'm good as long as what goes in == what comes out, otherwise, there are suprises

@bararchy Well, that's the question here: Should we keep everything as it comes in (which would imply possibly mixed styles in one place), or should we apply some kind of normalization to headers in order to have uniform representation.
I don't think that either way necessarily causes more issues than the other. But the status quo after #8002 is the least appreciated, but also not intended (common headers are normalized, others not).

oops

Source for the lowercasing headers:

https://tools.ietf.org/html/rfc7540#section-8.1.2

Just as in HTTP/1.x, header field names are strings of ASCII
characters that are compared in a case-insensitive fashion. However,
header field names MUST be converted to lowercase prior to their
encoding in HTTP/2. A request or response containing uppercase
header field names MUST be treated as malformed (Section 8.1.2.6).

Yeah for HTTP/2. There is no such rule for older protocols. But I guess we essentially want to use the same data structures for any HTTP protocol, so it's probably better to normalize downcased right away.

Maybe I can remove the optimization and leave the code as it was before.

A conservative change including optimization would be to compare common names case-sensitive and use capitalized strings (or both capitalized and downcased). That's strictly better than just reverting the optimization.

Alternatively we could just proceed forward and downcase everything as @ysbaddaden suggested.

@straight-shoota You mean making the lookup hash be:

COMMON_HEADERS = {
  "content-length" => "content-length",
  "Content-Length" => "Content-Length",
  # etc.
}

?

Yes. But you wouldn't need a hash then, a list would suffice when there is no normalization.

Makes sense. I like that approach. It wouldn't cover other cases like "Content-length", though we might also want to consider that... I'll later update this PR with all these variants. It's probably the best of all worlds: optimized and keeps the original casing in most common cases. Great idea!

Was this page helpful?
0 / 5 - 0 ratings