Go: encoding/json: add sample fuzz test for prototype of "fuzzing as a first class citizen"

Created on 6 Apr 2019  Â·  32Comments  Â·  Source: golang/go

Summary

In #30719 and #30979, dvyukov/go-fuzz compatible fuzz functions were landed in:

That was done primarily to help with the exploration requested by the core Go team in discussion of the #19109 proposal to "make fuzzing a first class citizen" (in addition to other benefits).

The follow-up suggestion here is to add a Fuzz function to the standard library encoding/json package.

The starting point most likely should be the Fuzz function in https://github.com/dvyukov/go-fuzz-corpus/blob/master/json/json.go.

This is a particularly interesting example given the Fuzz function there is a bit more complex than the first two samples landed, and hopefully there is some value especially given @mvdan has recently been landing a nice series of performance-related changes to encoding/json.

There would be no corpus checked in for now. (The approach to the corpus is being discussed elsewhere, e.g., #31215).

Background

See the "Background" section of #30719 or https://github.com/golang/go/issues/19109#issuecomment-441442080.

Additional Details

Following the pattern set by #30719 and https://golang.org/cl/167097, the following are likely true for how to proceed here:

  1. The build tag should be // +build gofuzz
  2. The name of the files should be fuzz.go
  3. The license header should be the Go standard library license. @dvyukov might need to make a similar statement as he made in CL 167097.
  4. In general, even for Fuzz functions guarded by a build tag, new dependencies should be avoided, especially with the introduction of modules.

For reference, here is a gist showing the diff between dvyukov/go-fuzz-corpus/tiff/tiff.go and the final form as merged into x/image/tiff.

Two issues that likely would need to be resolved prior to merging into the standard library:

  1. The current dvyukov/go-fuzz-corpus json fuzzing function has a dependency on github.com/dvyukov/go-fuzz-corpus/fuzz for fuzz.DeepEqual. This dependency would need to be eliminated. An initial solution might be (a) eliminating that round-trip test for now, or (b) the longer term solution might be open coding the comparison (e.g., comment from @dvyukov in https://github.com/golang/go/issues/30979#issuecomment-475225947) or perhaps rely on an existing comparison function in an existing _test.go or similar, or some other solution.

  2. The current dvyukov/go-fuzz-corpus json fuzzing function can trigger a false positive when round-tripping through Unmarshal/Marshal in the presence of duplicate keys (e.g., see comments from @josharian in https://github.com/google/oss-fuzz/pull/2188#issuecomment-468329478 or dvyukov/go-fuzz-corpus#3). An initial solution might be (a) temporarily eliminating that round-trip test, or (b) perhaps scan the serialized JSON for duplicate keys to avoid the false positive, or other possible longer term solutions.

Happy to be corrected if any of the above is different than how people would like to proceed here.

Finally, @mvdan, I don't want to put you on the spot, but in other discussions you had expressed some interest in this. Are you still interested? And of course no worries if too busy with other things.

CC @dvyukov @josharian @FiloSottile @acln0

FrozenDueToAge NeedsInvestigation Testing

Most helpful comment

Change https://golang.org/cl/174301 mentions this issue: html: add a Fuzz function

All 32 comments

This sounds like a good idea, and I was planning on fuzzing the decoder before the 1.13 release in any case.

Are you still interested? And of course no worries if too busy with other things.

Sorry, could you clarify what part you want me to play here? I'm happy to help review and bounce ideas, for example.

@thepudds I haven't read the background on everything in the above issues, but I would be willing to contribute time on this. Let me know if I can help.

@elwinar That would be great! There is plenty of opportunity to help make this happen, including in some small-ish steps that individually would not be much time, I think.

The main goal of this issue here is to bring in https://github.com/dvyukov/go-fuzz-corpus/blob/master/json/json.go into the standard library encoding/json package.

However, for this issue it probably would make sense to first do a small-ish step for encoding/json, which would be still be a nice win.

The opening comment here in https://github.com/golang/go/issues/31309#issue-430058716 outlines two problems with the current JSON fuzzing function in dvyukov/go-fuzz-corpus. Fortunately, both of those problems are related to the round-trip test. Quoting that comment partially here:

  1. The current dvyukov/go-fuzz-corpus json fuzzing function has a dependency on github.com/dvyukov/go-fuzz-corpus/fuzz for fuzz.DeepEqual ...
    ...
  2. The current dvyukov/go-fuzz-corpus json fuzzing function can trigger a false positive when round-tripping through Unmarshal/Marshal in the presence of duplicate keys ...

The "simple" solution to get the ball rolling would just be to eliminate that round-trip test from the standard library version when https://github.com/dvyukov/go-fuzz-corpus/blob/master/json/json.go is initially converted over. The current pattern is basically Unmarshal, Marshal, Unmarshal, DeepEqual. The new pattern could likely just be Unmarshal, Marshal.

Does that sound of interest? It would be very nice to get even a basic initial version of a Fuzz function for JSON into the standard library...

(And sorry, I should have updated this issue regarding @mvdan's question above, but @mvdan and I ended up chatting elsewhere about this issue here, and he more-or-less said that he has long-term interest here, but that he was already maxed out for anything in a 1.13 timeframe).

I'll have a look at it.

Change https://golang.org/cl/174058 mentions this issue: encoding/json: add a Fuzz function

In the end, the function worked well with the standard DeepEqual. I ran it for half an hour without noticing any crash, which I guess is fine (but I have only 12 cores, so maybe a real fuzzing farm would find a crash).

I didn't look into the second issue (the duplicate key thing) yet.

On the way, I studied the Fuzz function itself, and removed a few things. I will possibly add them back later, but I believe the conditions about the json.RawMessage thing aren't necessary any longer since json.RawMessage doesn't do funky things with base64 anymore.

Also, switched the second Unmarshal error to a panic, as I believe anything that was marshaled should be unmarshaled without error.

(And now I'll stop spamming and go back to work.)

@elwinar That is very exciting.

In general, I think your philosophy of simplifying makes sense, especially to start. Going from 0 to 1 Fuzz function for JSON in the standard library is the biggest step, and improvements can happen later, as you say.

As part of that "start simple" approach you are taking, I would suggest removing the DeepEqual check.

Part of the overall goal is to make sure the initial round of Fuzz functions that land the standard library are high quality, especially with regards to avoiding spurious false positives. Right now, especially as we are starting, avoiding false positives is more important than the sophistication of any given fuzzing function. The duplicate key issue has triggered false positives in practice (e.g., see dvyukov/go-fuzz-corpus#3), and removing the DeepEqual check here is hopefully a trivial way to avoid those false positives for now, and a more sophisticated solution to avoid tripping over duplicate keys could come later (including perhaps we could even open a separate issue for that after your CL lands).

What do you think?

I've been toying with this duplicate key issue because it seemed really annoying.

I note that it is only a problem for the S struct, and it depends on the number of indirection levels for the O field. You can play with and example here: https://play.golang.org/p/2tAoh8TG8if

Looking at dvyukov/go-fuzz-corpus#3 I agree that this input is perfectly valid JSON, but I think in this case the parser behavior is kinda inconsistent. I would expect the field to either always be nil on the first indirection or the last, while right now it depends on what was previously in the field.

Removing the DeepEqual check and opening an issue for this seems fine for now.

Removing the DeepEqual check and opening an issue for this seems fine for now.

OK, great. I am happy to open up the follow-on issue if that makes sense (and to some degree it is already being tracked in dvyukov/go-fuzz-corpus#3).

@thepudds OK, what's the next step? Want me to go wild and integrate as many as possible from the go-fuzz-corpus repository?

That’s great that you landed that CL!

Good next step could be grabbing whatever seems easiest and/or most interesting from this list:

https://github.com/dvyukov/go-fuzz-corpus/blob/master/flate/flate.go

https://github.com/dvyukov/go-fuzz-corpus/blob/master/lzw/lzw.go

https://github.com/dvyukov/go-fuzz-corpus/blob/master/stdhtml/main.go

https://github.com/dvyukov/go-fuzz-corpus/blob/master/gzip/gzip.go

https://github.com/dvyukov/go-fuzz-corpus/blob/master/zlib/main.go

Could be separate CLs with “Updates #19109”...

(On mobile, so brief)

I'll have a look at it somewhere this week.

I don't think we should we wild just right now. We need several of them. But then it would be more useful to start shaking out other aspects: add testing.F argument to go-fuzz, update existing fuzz functions to use it, setup OSS-Fuzz continuous fuzzing and see how it goes. Once we are sure this is what we want and it all works as expected, we can go wild. But for now if one just wants to run go-fuzz locally on std lib, that can well be done from go-fuzz-corpus repo.

Agreed, definitely no need to go wild.

Right now, there are 2 go-fuzz compatible Fuzz functions in the standard library (the second one being the JSON one from @elwinar).

I had been hoping for something like ~5 standard library Fuzz functions in this first wave (and also thinking that identifying something like ~10 targets might help get to ~5 done faster).

Today might be the last day for that first wave, given the 1.13 freeze starts tomorrow, and there have been a few statements that the threshold for changes after freeze would be higher in 1.13 than it has been in prior releases. Maybe a testing function will be considered okay after the freeze, or maybe it won't.

And the golang.org/x repositories are a different story, but here commenting on the standard library.

(Sorry, I’m traveling, and still on mobile, so still brief)

Didn't had the time to look at them, but I'll try to have one more by the evening to hopefully land it before the freeze. Then, we could progress on the prototype, I expect there will be lots of work to do.

@elwinar That sounds great.

Could you briefly comment on which one you think you would target?

And of course, no obligations and no worries if you don’t end up having time... this is open source after all ;-)

One other quick comment is the five candidates targets listed in https://github.com/golang/go/issues/31309#issuecomment-487640786 are all deliberately on the simpler end of the spectrum of the different functions in go-fuzz-corpus, including none of those 5 have external dependencies and none of those use DeepEqual.

(So hopefully any of those would be more straightforward than the JSON one).

I'll have a try with the HTML one. I've already used it and I know how HTML works. I can't really say the same for all the others :P

Change https://golang.org/cl/174301 mentions this issue: html: add a Fuzz function

The HTML one was quick. I've looked at the gzip one and it's much more complex, I don't want to rush it. Given that I never used the others, I won't even look at them now.

On the other hand, although it includes a DeepEqual, I'm fairly sure I can do the CSV one quickly : https://github.com/dvyukov/go-fuzz-corpus/blob/master/csv/main.go
The DeepEqual isn't even strictly necessary as the CSV format is simple, but it's easier to manipulate.

csv would be a nice one too, especially if you drop the DeepEqual as you said.

Actually, the DeepEqual is simpler than having to check the slice equality by hand, so I would keep it for now.

Change https://golang.org/cl/174302 mentions this issue: encoding/csv: add a Fuzz function

@thepudds So, the freeze is started. On one hand I could continue implementing more fuzzing functions in the stdlib, but on the other hand dvyukov's suggestion about making progress on the fuzzing interface itself seems a good idea to me.

Do you/we have a roadmap of some kind here? And maybe a btter way to communicate about this than github issues?

We unfortunately don't have an official roadmap, nor a better place for discussions. I can offer go-fuzz issue tracker, but that's again github issues. One may register a googlegroup for this.
One the small scale finishing testing.F (or fuzzing.F) argument to Go functions seems like a good thing to do. Also finishing setting up at least some OSS-Fuzz integration.
On the larger scale: figuring out if the proposed interface is actually what we want. We switch OSS-Fuzz to fzgo (https://github.com/thepudds/fzgo) and see if it all works as expected in an automated, continuous setting. And try to use fzgo on stdlib and some third-party repos to assess how it work for manual/local testing.
Re actual integration into stdlib: the future seems to be vague at the moment. There is push back on adding fuzzing instrumentation to compiler and no clear plan for source-2-source transformations support for the go command.

OK. I've created this: https://groups.google.com/forum/#!forum/golang-fuzzing-proposal.

I would suggest to make the contents public and not require admin approval to join.

Yeah, the discussion is scattered all over the place. E.g. on the question of adding testing.F/fuzz.F, there is conversation at https://github.com/dvyukov/go-fuzz/issues/218 and code at https://github.com/dvyukov/go-fuzz/pull/223.

Was this page helpful?
0 / 5 - 0 ratings