Netty: HPackDecoder should treat header value as opaque sequence of octets

Created on 21 Dec 2017  路  2Comments  路  Source: netty/netty

Expected behavior

According to the HPack Specification:

[Header] Names and values are considered to be opaque sequences of octets

As such, we should be able to use an UTF-8 encoded string as a header value.

Actual behavior

HPackDecoder treats header name and value as ASCIIString.
(This is less an issue for header field name because the spec says that "header field names are strings of ASCII characters" )

Minimal yet complete reproducer code

package io.netty.handler.codec.http2;

import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import org.junit.Test;

import static io.netty.handler.codec.http2.Http2HeadersEncoder.NEVER_SENSITIVE;
import static org.hamcrest.CoreMatchers.is;
import static org.junit.Assert.assertThat;

public class ReproducerTest {
    @Test
    public void headerUnicodeValueRoundTrip() throws Exception {
        ByteBuf in = Unpooled.buffer(100);
        try {
            HpackEncoder hpackEncoder = new HpackEncoder(true);

            Http2Headers toEncode = new DefaultHttp2Headers();
            toEncode.add("test", "\uF93D\uF936\uF949\uF942");
            hpackEncoder.encodeHeaders(1, in, toEncode, NEVER_SENSITIVE);

            Http2Headers decoded = new DefaultHttp2Headers();
            HpackDecoder hpackDecoder = new HpackDecoder(8192, 32);
            hpackDecoder.decode(1, in, decoded);

            assertThat(decoded.get("test").toString(), is("\uF93D\uF936\uF949\uF942"));
        } finally {
            in.release();
        }
    }
}

Results:

java.lang.AssertionError: 
Expected: is "铯斤ざ铳夛"
     but: was "????"
Expected :铯斤ざ铳夛
Actual   :????

Netty version

4.1.20

JVM version (e.g. java -version)

1.8.0_151

not a bug

All 2 comments

The name AsciiString is a bit overloaded in this context. It is the core storage used for HTTP/1.x (where ascii is king) and also HTTP/2.0 (where binary is desired). We do preserve the bytes when encoding/decoding but we also implement the CharSequence API for "convenience" and compatibility with existing APIs. What you are seeing here is java represents String in UTF-16, and because the underlying storage of String is char[] we fill each element of the array such that the LSB has data, and the MSB has nothing (because we only have a byte worth of data, and char is 2 bytes wide). So if you want to convert to java Strings you have to go through a few Charset conversions first:

  • toString() -> converts from AsciiString's byte[] into a String which gives you a char[] where LSB has data, MSB has nothing
  • toString().getBytes(CharsetUtil.ISO_8859_1) -> gives you a byte[] which pulls the LSB out of each element of the char[] from the String.
  • new String(binaryValue.toString().getBytes(CharsetUtil.ISO_8859_1), CharsetUtil.UTF_8.name()) -> takes the byte[] from above and converts it to UTF_8 String

In summary the bytes are preserved, I would not recommend going through the String conversion if it can be avoided, and instead just stick with the bytes provided by AsciiString:

    @Test
    public void headerUnicodeValueRoundTrip() throws Exception {
        ByteBuf in = Unpooled.buffer(100);
        try {
            HpackEncoder hpackEncoder = new HpackEncoder(true);

            Http2Headers toEncode = new DefaultHttp2Headers();
            String expectedString = "\uF93D\uF936\uF949\uF942";
            byte[] expectedBytes = expectedString.getBytes(CharsetUtil.UTF_8);
            AsciiString expectedValue = new AsciiString(expectedBytes);
            toEncode.add("test", expectedValue);
            hpackEncoder.encodeHeaders(1, in, toEncode, NEVER_SENSITIVE);

            Http2Headers decoded = new DefaultHttp2Headers();
            HpackDecoder hpackDecoder = new HpackDecoder(8192, 32);
            hpackDecoder.decode(1, in, decoded);

            AsciiString binaryValue = (AsciiString) decoded.get("test");
            assertThat(binaryValue, is(expectedValue));
            assertThat(new String(binaryValue.toString().getBytes(CharsetUtil.ISO_8859_1), CharsetUtil.UTF_8.name()),
                    is(expectedString));
        } finally {
            in.release();
        }
    }

Thanks a lot for the explanation @Scottmitch !

Was this page helpful?
0 / 5 - 0 ratings