According to the HPack Specification:
[Header] Names and values are considered to be opaque sequences of octets
As such, we should be able to use an UTF-8 encoded string as a header value.
HPackDecoder treats header name and value as ASCIIString.
(This is less an issue for header field name because the spec says that "header field names are strings of ASCII characters" )
package io.netty.handler.codec.http2;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import org.junit.Test;
import static io.netty.handler.codec.http2.Http2HeadersEncoder.NEVER_SENSITIVE;
import static org.hamcrest.CoreMatchers.is;
import static org.junit.Assert.assertThat;
public class ReproducerTest {
@Test
public void headerUnicodeValueRoundTrip() throws Exception {
ByteBuf in = Unpooled.buffer(100);
try {
HpackEncoder hpackEncoder = new HpackEncoder(true);
Http2Headers toEncode = new DefaultHttp2Headers();
toEncode.add("test", "\uF93D\uF936\uF949\uF942");
hpackEncoder.encodeHeaders(1, in, toEncode, NEVER_SENSITIVE);
Http2Headers decoded = new DefaultHttp2Headers();
HpackDecoder hpackDecoder = new HpackDecoder(8192, 32);
hpackDecoder.decode(1, in, decoded);
assertThat(decoded.get("test").toString(), is("\uF93D\uF936\uF949\uF942"));
} finally {
in.release();
}
}
}
Results:
java.lang.AssertionError:
Expected: is "铯斤ざ铳夛"
but: was "????"
Expected :铯斤ざ铳夛
Actual :????
4.1.20
java -version)1.8.0_151
The name AsciiString is a bit overloaded in this context. It is the core storage used for HTTP/1.x (where ascii is king) and also HTTP/2.0 (where binary is desired). We do preserve the bytes when encoding/decoding but we also implement the CharSequence API for "convenience" and compatibility with existing APIs. What you are seeing here is java represents String in UTF-16, and because the underlying storage of String is char[] we fill each element of the array such that the LSB has data, and the MSB has nothing (because we only have a byte worth of data, and char is 2 bytes wide). So if you want to convert to java Strings you have to go through a few Charset conversions first:
toString() -> converts from AsciiString's byte[] into a String which gives you a char[] where LSB has data, MSB has nothingtoString().getBytes(CharsetUtil.ISO_8859_1) -> gives you a byte[] which pulls the LSB out of each element of the char[] from the String.new String(binaryValue.toString().getBytes(CharsetUtil.ISO_8859_1), CharsetUtil.UTF_8.name()) -> takes the byte[] from above and converts it to UTF_8 StringIn summary the bytes are preserved, I would not recommend going through the String conversion if it can be avoided, and instead just stick with the bytes provided by AsciiString:
@Test
public void headerUnicodeValueRoundTrip() throws Exception {
ByteBuf in = Unpooled.buffer(100);
try {
HpackEncoder hpackEncoder = new HpackEncoder(true);
Http2Headers toEncode = new DefaultHttp2Headers();
String expectedString = "\uF93D\uF936\uF949\uF942";
byte[] expectedBytes = expectedString.getBytes(CharsetUtil.UTF_8);
AsciiString expectedValue = new AsciiString(expectedBytes);
toEncode.add("test", expectedValue);
hpackEncoder.encodeHeaders(1, in, toEncode, NEVER_SENSITIVE);
Http2Headers decoded = new DefaultHttp2Headers();
HpackDecoder hpackDecoder = new HpackDecoder(8192, 32);
hpackDecoder.decode(1, in, decoded);
AsciiString binaryValue = (AsciiString) decoded.get("test");
assertThat(binaryValue, is(expectedValue));
assertThat(new String(binaryValue.toString().getBytes(CharsetUtil.ISO_8859_1), CharsetUtil.UTF_8.name()),
is(expectedString));
} finally {
in.release();
}
}
Thanks a lot for the explanation @Scottmitch !