Many crashes are linked to OutOfMemoryError
on low-end devices due to the fact that we systematically store responses in strings, whereas we could process them as a stream. This is due to our all-regex approach, which could be replaced in many cases by a sax-like parser such as the one from tagsoup. That would not require storing whole web pages in memory.
java.lang.OutOfMemoryError: (Heap Size=11719KB, Allocated=6276KB, Bitmap Size=7969KB)
at ch.boye.httpclientandroidlib.util.CharArrayBuffer.expand(CharArrayBuffer.java:63)
at ch.boye.httpclientandroidlib.util.CharArrayBuffer.append(CharArrayBuffer.java:93)
at android.support.v4.app.ActivityCompatHoneycomb.toString(RequiredFields.java:225)
at cgeo.geocaching.network.Network.getResponseDataNoError(Network.java:371)
at cgeo.geocaching.network.Network.getResponseData(Network.java:387)
at cgeo.geocaching.network.Network.getResponseData(Network.java:380)
at cgeo.geocaching.connector.gc.Login.switchToEnglish(Login.java:221)
at cgeo.geocaching.connector.gc.Login.login(Login.java:87)
at cgeo.geocaching.cgeo$firstLogin.run(cgeo.java:825)
I was always dreaming of applying regular expressions to streams as an alternative solution to the problem. However, I have no clue how well that works. A possible implementation is hinted at here: http://stackoverflow.com/questions/716927/applying-a-regular-expression-to-a-java-i-o-stream
Well, I think that we should not use regular expressions at all to parse HTML pages. We should use something like TagSoup instead: http://ccil.org/~cowan/XML/tagsoup/
@samueltardieu This issue is rather old. Is it still relevant? As far as I understand we meanwhile started using things like JSoup and other methods, and perhaps this issue is much to generic to ever be solved?!
We use JSoup in some places, but not everywhere, far from it. The issue is still relevant.
Jsoup is only used in some (small) parts, and there are still many regex used :(. we should keep the issue and probably rename it to "Use a dom parser instead of Regex"
Most helpful comment
Well, I think that we should not use regular expressions at all to parse HTML pages. We should use something like TagSoup instead: http://ccil.org/~cowan/XML/tagsoup/