Dep: Faster TOML parser?

Created on 25 Jun 2017  路  13Comments  路  Source: golang/dep

This is just your information, not an actual issue.

The TOML parser from @BurntSushi recently switched to an MIT license:
https://github.com/BurntSushi/toml/issues/183
The previous license kept it from being selected for dep initially.

Hugo switched back to BurntSushi/toml because it is was found to be faster than pelletier/go-toml @pelletier:
https://github.com/gohugoio/hugo/commit/0907a5c1c293755e6bf297246f07888448d81f8b

@carolynvs I'm not suggesting that dep needs to switch -- just ensuring you're all aware.

Most helpful comment

I'm currently reworking the parser so that when you use Unmarshal it does
not create the intermediate tree. It should make this kind of operation
significantly faster. Currently it is on par performance-wise with
BurntSushi. If you have any specific benchmark/TOML document that shows
otherwise I'd love to work on it. :)

@sdboyer making the writer output arrays with one item per line is next on

my list if nobody gets to it before.

Thomas Pelletier

All 13 comments

Thanks for the heads up!

Do you still see the performance issues on that branch https://github.com/pelletier/go-toml/pull/176 ? Performance seems to me on par with BurntSushi's.

@pelletier It may be worth checking with @bep on the Hugo project. Maybe he can share more details regarding his benchmarks.

I have not looked at Hugo's benchmarks in relation to the recent tunings in go-toml: But remember this: Hugo have very different requirements in this area than deps -- handling one TOML file should be really, really fast with both of them -- and functionally both should be equal.

Is dep parsing one TOML file or transitively for each dependency?

Is dep parsing one TOML file or transitively for each dependency?

dep detects if a dependency uses dep as well, and reads in the Gopkg.toml file.

dep detects if a dependency uses dep as well, and reads in the Gopkg.toml file.

Indeed. Though, that operation is likely to get permanently cached at some point - #431 - so still not a terribly big concern.

okay. I'll close this for now.

I was trying to figure out what's the best/canonical Go TOML parser by looking at which one dep uses. Perhaps other people do the same. Then I found this issue. It's something to consider, but being a highly visible project means dep sends validation to whatever dependencies it chooses over others.

@shurcooL yeah, i hear you on that - i'm always cognizant of the signal it sends when dep brings in dependencies.

at the same time, we have to make decisions like everyone else does - weigh the software that exists _now_ in the context of our specific requirements and the tradeoffs that we can accept. we were originally looking at BurntSushi, but as @nathany noted in the OP, licensing issues made that impossible.

switching is possible now that that's resolved, but the calculus changes, as we have to evaluate whether the effort of doing so is actually worth it for us. my feeling is that, unless and until we encounter some critical problem in @pelletier's implementation (and it's not being actively fixed), the work and risk of change to switch isn't worth it.

note that "critical problem" also covers features that we'd like to add - like, i know i've discussed at least with @carolynvs (though i can't remember where) that it'd be really good if the generator could format lists with one item per line, as that'd help enormously with diffs. but idk if the BurntSushi implementation supports that, either. until i see benchmarks indicating otherwise, i suspect that's much more important for dep than raw TOML in/out performance.

i'm always cognizant of the signal it sends when dep brings in dependencies.

Good to hear. That's all I wanted to bring up.

at the same time, we have to make decisions like everyone else does - weigh the software that exists now in the context of our specific requirements and the tradeoffs that we can accept. we were originally looking at BurntSushi, but as @nathany noted in the OP, licensing issues made that impossible.

That makes sense, I'm in agreement.

switching is possible now that that's resolved, but the calculus changes, as we have to evaluate whether the effort of doing so is actually worth it for us. my feeling is that, unless and until we encounter some critical problem in @pelletier's implementation (and it's not being actively fixed), the work and risk of change to switch isn't worth it.

I don't disagree, but I do wonder. TOML is a well defined specification. Shouldn't it be the contrary鈥攔ather easy and safe鈥攖o swap out one implementation for another, as long as both are correct? In fact, doing so could serve as a sanity check and provide confidence that there are no unexpected issues, or help catch them otherwise.

I'm thinking this in the context of just trying the change locally and seeing that all tests continue to pass, or someone trying it out locally. You wouldn't need to subject all users to this test.

But I do expect this is likely to be a very low priority for dep.

Shouldn't it be the contrary鈥攔ather easy and safe鈥攖o swap out one implementation for another, as long as both are correct? In fact, doing so could serve as a sanity check and provide confidence that there are no unexpected issues, or help catch them otherwise.

I'm thinking this in the context of just trying the change locally and seeing that all tests continue to pass, or someone trying it out locally. You wouldn't need to subject all users to this test.

yeah, i had this thought as i was writing my above response 馃槃 this is exactly the sort of thing for which it _should_ be easy to swap in a new implementation - a well-defined external spec that has a narrow, well-defined interface in the language itself. there should be very little risk there.

it's really just more of an "if it ain't broke, don't fix it" situation. without some specific reason to move, there's little sense in investing the effort. or, in other words:

this is likely to be a very low priority for dep.

馃槃

I'm currently reworking the parser so that when you use Unmarshal it does
not create the intermediate tree. It should make this kind of operation
significantly faster. Currently it is on par performance-wise with
BurntSushi. If you have any specific benchmark/TOML document that shows
otherwise I'd love to work on it. :)

@sdboyer making the writer output arrays with one item per line is next on

my list if nobody gets to it before.

Thomas Pelletier

Was this page helpful?
0 / 5 - 0 ratings