Go: x/net/html: fuzz this package

Created on 25 Sep 2018 · 13Comments · Source: golang/go

Given a couple of bugs reported by @tr3ee from malformed/incomplete tags
like:

whose reproducers are quite simple and have caused runtime panics or infinite hangs, perhaps fuzzing could help us discover what lurks beyond and even such cases.

/cc @namusyaka @dgryski @dvyukov @bradfitz @nigeltao

Testing help wanted

Source

odeke-em

👍2

Most helpful comment

It's not just random data, see:
https://go-talks.appspot.com/github.com/dvyukov/go-fuzz/slides/go-fuzz.slide
https://go-talks.appspot.com/github.com/dvyukov/go-fuzz/slides/fuzzing.slide
Also you can pre-bootstrap corpus with some meaningful inputs.

dvyukov on 26 Sep 2018

❤2

All 13 comments

This bug has fixed.
See: https://go-review.googlesource.com/136875

yorelog on 25 Sep 2018

The current implementation seems to be incompleted, will be fixed by conforming latest spec.

namusyaka on 25 Sep 2018

@yorelog that CL fixed #27702 in particular, but I agree with @odeke-em that, in general, it could be useful to fuzz x/net/html, over and above reacting to specific bugs.

That's easy for me to say, though. I don't have time to work on this myself.

nigeltao on 26 Sep 2018

What would you suggest as a fuzzing strategy? I could run domato against this lib and report crashes/hangs if this makes any sense.

empijei on 26 Sep 2018

❤1

The idea is to use go-fuzz.

agnivade on 26 Sep 2018

I would happily use go-fuzz but I'm not sure fuzzing an html parser with just random data would cover all interesting paths. It's hard to produce stuff like <math><template><mo><template> (one of the bugs listed above was triggered by that) with a random sequence generator.

Maybe we could use both?

empijei on 26 Sep 2018

It's not completely random. You can specify an initial corpus data. go-fuzz will take it from there.

agnivade on 26 Sep 2018

dvyukov on 26 Sep 2018

❤2

Actually I think I already did this in 2015:
https://github.com/dvyukov/go-fuzz-corpus/blob/master/html/html.go
But the corpus is not checked in.

dvyukov on 26 Sep 2018

Update:

Running gofuzz but didn't find anything so far (except for the already reported bugs) but I will leave it running for a while

Ran domato against the patched html library and found 3 crashes with a sample size of 10K files. Is anyone interested in looking into the cause of the crash? (The files are big and messy to inspect, will probably take me some time to go through them).

empijei on 26 Sep 2018

I'm not sure fuzzing an html parser with just random data would cover all interesting paths

There's at least a couple of approaches to addressing this.

One is to use a "fuzzing dictionary" and/or "seed corpus", described in https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md

Two is to accept arbitrary random bytes as input, and map each byte to a string, a string more likely to tickle interesting code paths in the HTML parser. For example: https://play.golang.org/p/3QE4960bHsa

Doing the reverse map from the existing HTML test cases to this "compressed" format is left as an exercise for the reader.

Once you have a dense mapping like this, where each raw input byte is relatively independent, it might be relatively straightfoward to minimize the repro case, if go-fuzz doesn't already help you do so: cut out random sub-slices of the "compressed", backing off if it no longer crashes.

nigeltao on 27 Sep 2018

👍1

Nice idea, that is probably going to take a longer while. I'll add info when I have news. Thanks for this.

empijei on 27 Sep 2018

Just to give a quick update: I gave this a shot a couple of months ago and didn't find any relevant crashes in a couple of weeks of fuzzing.

My plan now is to wait and see how support for oss-fuzz and first-class citizenship for fuzzing discussions will unfold. If fuzzing becomes part of the testing flow in Go I'll provide the needed FuzzXyz functions and write the necessary configurations to have it run on some beefy hardware and cover it properly.
Otherwise I'll setup some machines to fuzz it in other ways.

empijei on 4 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings