Converters using the charStream() convenience from OkHttp's ResponseBody do not handle a BOM well.
_Original issue:_
SimpleXmlConverter throws XMLStreamException reading xml with BOM after updated to 2.1.0.
But it's not happen in 2.0.2.
Test codes:
apply plugin: 'java'
sourceCompatibility = 1.7
repositories {
jcenter()
}
def retrofitVersion = '2.1.0'
dependencies {
testCompile 'junit:junit:4.12'
testCompile 'com.squareup.okhttp3:mockwebserver:3.3.1'
testCompile "com.squareup.retrofit2:retrofit:$retrofitVersion"
testCompile "com.squareup.retrofit2:converter-simplexml:$retrofitVersion"
}
import okhttp3.mockwebserver.MockResponse;
import okhttp3.mockwebserver.MockWebServer;
import okio.Buffer;
import okio.ByteString;
import org.junit.Before;
import org.junit.Rule;
import org.junit.Test;
import org.simpleframework.xml.Root;
import org.simpleframework.xml.Text;
import retrofit2.Call;
import retrofit2.Retrofit;
import retrofit2.converter.simplexml.SimpleXmlConverterFactory;
import retrofit2.http.GET;
import java.io.IOException;
import static org.junit.Assert.assertEquals;
public class BomXml {
private interface Service {
@GET("/xml") Call<E> xml();
}
@Root private static class E {
@Text private String v;
}
@Rule public final MockWebServer server = new MockWebServer();
private Service service;
@Before public void setUp() {
service = new Retrofit.Builder()
.baseUrl(server.url("/"))
.addConverterFactory(SimpleXmlConverterFactory.create())
.build()
.create(Service.class);
}
@Test public void readBom() throws IOException {
read(true);
}
@Test public void readNoBom() throws IOException {
read(false);
}
private void read(boolean addBom) throws IOException {
Buffer buf = new Buffer();
if (addBom)
buf.write(ByteString.decodeHex("efbbbf"));
buf.write("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><e>a</e>".getBytes());
server.enqueue(new MockResponse().setBody(buf));
E e = service.xml().execute().body();
assertEquals("a", e.v);
}
}
Must have been the change from a binary stream to a character stream. I'll take a look tomorrow.
We might have to skip the BOM ourselves inside the response body converter. The way that Simple XML works to deserialize XML is impressively complicated. There are so many levels of abstraction it is legitimately insane. I lost track about 6 layers down when it bridged into javax.xml.stream.* classes. As far as I can tell there's no way to supply both a locale and a byte stream, so if we want to support non-whatever-the-default is we have to skip the BOM ourselves and then create and pass a Reader to be deserialized. It's legitimately depressing looking inside Simple XML and Java's XML classes to try and fix this.
I'm a bit wary of adding BOM support into Retrofit itself for this. Ideally Simple XML would let me hand it a Charset and an InputStream and it would take care of the BOM in addition to reading the contents using the appropriate charset.
I think that not need to BOM support into Retrofit because Java and Simple XML seems not supports BOM.
In that case, I think that should document it explicitly because SimpleXmlConverter's behavior has changed from 2.1.0.
I was able to avoid this issue by customizing Persister like below.
public class MyPersister extends Persister {
@Override public <T> T read(Class<? extends T> type, Reader source, boolean strict) throws Exception {
BufferedReader wrapper = new BufferedReader(source);
wrapper.mark(1);
char[] buf = new char[1];
wrapper.read(buf);
if (buf[0] != '\uFEFF') // Reset if not have BOM.
wrapper.reset();
return super.read(type, wrapper, strict);
}
}
This problem will plague any converter implemented using OkHttp's ResponseBody.charStream() API. In order to solve this we're going to augment Okio to handle creating a Reader with knowledge of the BOM. OkHttp will be updated to support this, but we don't have to wait for it to release since we can use the new Okio API as soon as they're released.
This is fixed in OkHttp for the next release. Once we update to it all char-based converters will automatically get BOM handling.
your are very stronger
Most helpful comment
This is fixed in OkHttp for the next release. Once we update to it all char-based converters will automatically get BOM handling.