I have a fairly large HTML page that has several JSON objects & arrays in it. I
use a PushbackReader to read the file and handle any text I'm interested in
(and determine when to start a JsonReader read). If I pass my Reader (or any
reader) to JsonReader, it will read 1024 characters regardless of content. This
means I may miss some JSON.
i.e. I may miss myObject2 in the following example because it might remain
unprocessed in the JsonReader buffer.
<h1>some html</h1>
<script language=javascript>
var myObject1={"a":1};
alert("some non-JSON that may or may not exceed the JsonReader buffer size");
var myObject2={"b":2};
NOTE: This is a much simplified version of my problem. The JSON data is
actually quiet large as is the HTML content.
As a temporary workaround, I've added the following method to JsonReader that
allows me to push the unprocessed characters in the JsonReader buffer back into
my PushbackReader (so I can start searching for "var myObject2=").
public char[] getUnreadCharacters() {
char[] output = new char[limit - pos];
if (pos < limit) {
System.arraycopy(buffer, pos, output, 0, limit - pos);
}
return output;
}
I also added a constructor that allows me to modify the buffer size.
Original issue reported on code.google.com by [email protected] on 6 Oct 2013 at 3:44
+1
JsonReader should not advance the underlying Reader past the end of the JSON that has been read.
We aren't going to do this in Gson. Doing so breaks buffering.
Can you explain how fixing this would break buffering? How exactly does it break?
Also, a separate question, why does JsonReader do its own buffering rather than delegating buffering to a buffered reader?
+1
Why not use mark/reset/read (if supported) to rewind the underlying Reader so it points to the next character after the consumed JSON object, rather than over-subscribe on the input?
If you'd like, create a Reader that returns one character at a time. That'll give you the behavior you want!
Not sure if that's a snipe or you are being serious, but that won't work.
The issue is that gson/stream/JsonReader.fillBuffer (line 1300) reads up to 1024 bytes ahead from the caller-supplied Reader (up to buffer.length), but unconsumed bytes are never returned to that Reader. Unless I'm missing something, they are simply thrown away with the JsonReader, leaving the caller-supplied Reader in an unknown/unusable state.
Supplying one byte at a time from the Reader still means up to 1024 bytes will get read from it and thrown away, leaving the Reader positioned some arbitrary number of characters past the consumed input.
Therefore, the request here is to return unconsumed characters back to the caller-supplied Reader, so that the caller has access to the next byte following the consumed JSON object (presumably using reader.mark/.reset, if reader.markSupported() is true).
@jhugard it'll only put a single char in the buffer if that鈥檚 what read() returns. It鈥檒l attempt up to 1024 chars, but it takes what it can get. If you create a reader that returns one character at a time, it won't read more than that.
Gotcha (my Java is rather rusty), but that's likely to impact performance.
This issue has also been reported against Google/Protobuf's use of BSON, since that's where the issue is being encountered. Sounds like the Protobuf contributors have a different solution in mind, plus we have a work-around. Thanks for taking the time to answer.
@jhugard sorry to disturb u, I have encountered the same problem, Google bring me to here, I have read all your discussions but still don't know how to deal with this case, could u please give me some suggestion at your convenience ? Thanks very much
Most helpful comment
Not sure if that's a snipe or you are being serious, but that won't work.
The issue is that gson/stream/JsonReader.fillBuffer (line 1300) reads up to 1024 bytes ahead from the caller-supplied Reader (up to buffer.length), but unconsumed bytes are never returned to that Reader. Unless I'm missing something, they are simply thrown away with the JsonReader, leaving the caller-supplied Reader in an unknown/unusable state.
https://github.com/google/gson/blob/2b15334a4981d4e0bd4f2c2d34b8978a4167f36a/gson/src/main/java/com/google/gson/stream/JsonReader.java#L1300
Supplying one byte at a time from the Reader still means up to 1024 bytes will get read from it and thrown away, leaving the Reader positioned some arbitrary number of characters past the consumed input.
Therefore, the request here is to return unconsumed characters back to the caller-supplied Reader, so that the caller has access to the next byte following the consumed JSON object (presumably using reader.mark/.reset, if reader.markSupported() is true).