It's a mistake to create a private syntax for a configuration file when there are existing, perfectly fine formats available that are well understood and have publicly available parsers.
P.S. This issue is made thornier by the peculiar assertion that the format is fixed before anyone has a chance to comment on it.
My vote is for XML.
(On a more serious note: it'd be nice to have something that's easy to write by humans. JSON is easy to read, annoying to write. YAML is nice both ways.)
Perhaps some of this dep thread from last year is relevant: https://github.com/golang/dep/issues/119
https://github.com/hashicorp/hcl/blob/master/README.md
Mentions YAML being confusing and not well understood. I dont particularly understand it either, considering the standard disallows tabs as separators, which is unusual and awkward for a whitespace agnostic language like Go.
I'm having a very hard time reconciling "yaml" and "perfectly fine format". It's not the description that springs to mind based on my experience.
A benefit of a custom format here is that only what's allowed is legal. Another is that everything can be given a nice expressive syntax. Error messages can be more easily tailored.
The similarity to Go syntax means it shouldn't be hard for anyone to learn it and syntax highlighters and the like should be easy to adapt from their Go counterparts.
The only major downside I see is that, as a new format, its implementation will require a certain amount of fuzzing and additional testing that would (hopefully) already be done otherwise. (And if the parser is put in the stdlib no one else will have to worry about that either).
...why not just have it be written Go?
I mean, we all know how to write it. We have well-tested lexers and parsers. We have syntax highlighting. We have formatters and tools that can vet the code. We even have a framework for parsing and running sets of files in go test
.
This go.mod
file
// My hello, world.
module "rsc.io/hello"
require (
"golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
"rsc.io/quote" v1.5.2
)
could become something like
package foobar
import vgo
func ModuleHelloWorld(v *vgo.V) {
v.Module("rsc.io/hello")
v.Require("golang.org/x/text", "v0.0.0-20180208041248-4e4a3210bb54")
v.Require("rsc.io/quote", "v.1.5.2")
}
It'd end up being similar to how go test
recognizes xxx_test.go
files.
vgo
could recognize a go.mod
, module
, xxx_module.go
, or whatever file in the root of the project and run the top-level function similar to ModuleXXX
kinda like TestXXX
.
Yeah, it doesn't _feel_ like a set of directives as much as runnable code, but since when has Go done something just so it _feels_ good as opposed to the practical option?
Theoretically, this could also take care of https://github.com/golang/go/issues/23972
Letās avoid bikeshedding on which existing format is best and wait for a response on why a custom syntax was chosen in the first place. It may have been an arbitrary decision, or it may not have. If it wasnāt, then understanding the decision will help inform future choices.
Most folks have settled on TOML. We don't really need another custom format or a format embedded in JSON or YAML.
Given that the go.mod file looks very similar to a go source file, why not add module and require as top level declarations and then we can write module syntax inline with our source code?
On 21 Feb 2018, at 15:58, david karapetyan notifications@github.com wrote:
Most folks have settled on TOML. We don't really need another custom format or a format embedded in JSON or YAML.
ā
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Indeed I felt immediately the tour trying with mod.go instead of go.mod !
@davecheney I guess the go.mod
file makes it easy to find the project root. If module
and require
become top level declarations in a "normal" .go files, then it would be more difficult to find the project root (you basically had to look for a .go file containing a module
declaration, which requires parsing).
The problem with TOML and YAML is that no-one has written code (AFAIK) that can read those formats (including comments) and write them back out again, gofmt style. See https://github.com/BurntSushi/toml/issues/213 for example. Also, YAML is a terrible format. Please no YAML.
I think I quite like the choice of a custom format as long as there is some straightforward way to convert to/from a well known format, because it can be exactly as simple as necessary, and as clean as possible.
Whatever the format is, it must be well defined and well documented, with a canonical formatting and non-internal libraries to read and write it. None of that exists at the moment.
I'm very reluctant to jump in here. I'm liking what I see from vgo so far, and I don't want to bikeshed on what might feel like a trivial topic.
However, I feel that part of the friction I'm feeling from my initial vgo experiments comes from the rest of the tools that I use to write Go and work with code in general. I think this is an opportunity to make adoption a little easier.
Hereās why I think we should consider adopting an existing common data format:
Motivations
go.mod
snippets, but we chose a common data format, it would just work out of the box. Letās not make GitHub, editor authors, and the rest implement and maintain a special parser just for this one file type.package.json
-- it's just JSON! -- and we could do it from any language. Thereās plenty of dependency management tasks to automate. A standard data format makes it that much easier to add vgo support, and could help it propagate through the ecosystem more quickly. go.mod
, we have to decide on a bespoke place to put it in the file ā and then go update all the parsers, syntax highlighters, and other tools in the rest of the tool chain.Requirements
I donāt have strong feelings about YAML vs JSON vs whatever else. Iāve used JSON fine with npm and YAML fine with Kubernetes, Helm, and Ansible. They both work, and Iām long past the point in my career where I care about arguments like that. (And for what itās worth, Iāve never been bugged by the lack of inline comments ā READMEs and Issues worked for the rare cases we needed to communicate about dependencies.) From where Iām sitting, the requirements are:
.properties
files are too restricting to future extension.Apologies in advance if I'm off base. I'm fairly new to Go myself, and I confess that I don't yet understand some the original motivations for a bespoke file format. There may be good reasons to go another direction that I'm overlooking!
@ecowden
Hierarchical. For instance, .properties files are too restricting to future extension.
@rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool. Given that dep made an explicit decision to go with TOML partly because it wasn't hierarchical, it seems unlikely that vgo would reverse that requirement.
From https://github.com/golang/dep/issues/119#issuecomment-287781062
The one thing that does stick out with TOML is, being not tree-structured, it's possible for us to append constraints to the manifest without rewriting it. That may turn out to be a very important factor in applying sane defaults that help guard us (that is, the entire public Go ecosystem) against nasty exponential growth in solver running time.
@ericlagergren While I like the simplicity of reusing go syntax, using the .go extension for the module file makes it likely that some projects will run into a conflict and have to rename some of their files to switch to vgo, which goes against the goal of making the migration as painless as possible.
The file name is not super central to the idea, IMO.
Le mer. 21 fĆ©vr. 2018 Ć 08:05, Hugues notifications@github.com a Ć©crit :
@ecowden https://github.com/ecowden
Hierarchical. For instance, .properties files are too restricting to
future extension.@rsc https://github.com/rsc states in his blog post that vgo is meant
to be a streamlining/simplification of the general-purpose dep tool. Given
that dep made an explicit decision to go with TOML partly because it
wasn't hierarchical, it seems unlikely that vgo would reverse that
requirement.From golang/dep#119 (comment)
https://github.com/golang/dep/issues/119#issuecomment-287781062The one thing that does stick out with TOML is, being not tree-structured,
it's possible for us to append constraints to the manifest without
rewriting it. That may turn out to be a very important factor in applying
sane defaults that help guard us (that is, the entire public Go ecosystem)
against nasty exponential growth in solver running time.@ericlagergren https://github.com/ericlagergren While I like the
simplicity of reusing go syntax, using the .go extension for the module
file makes it likely that some projects will run into a conflict and
have to rename some of their files to switch to vgo, which goes against the
goal of making the migration as painless as possible.ā
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/23966#issuecomment-367375465, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFnwZ2Ni2oC28frH2klja-K6ywyOXBCgks5tXD6RgaJpZM4SM3IZ
.
Other concerns aside, JSON does not allow comments, which is sufficient to disqualify it IMO.
Whatever format is in use, I certainly hope (with Rob) that there are good public manipulation libraries.
Here are some comments from years of working on goimports:
Dealing with comments is a nightmare. Part of this is the fault of go/ast and go/printer, but some of it is conceptualāit is often non-obvious what should happen to a comment when adding, deleting, or relocating an import.
The current format appears to accept single and factored forms. goimports has moved to factored forms only. This helps some with comments, and also keeps diffs clean. Also, factored-only will be easier to write regexps for, and sadly, lots of editors still use regexps for highlighting etc.
Grouping rules inevitably get complex (e.g. stdlib vs vendor vs other), particularly if you try to respect existing groupings.
This issue is made thornier by the peculiar assertion that the format is fixed before anyone has a chance to comment on it.
My point, which was arguably phrased too strongly, is that go.mods people write today will be understood by the eventual official tooling. I want to make clear that people will not have to throw them away and start over. Given that vgo already supports reading nine different legacy file formats (GLOCKFILE, Godeps/Godeps.json, Gopkg.lock, dependencies.tsv, glide.lock, vendor.conf, vendor.yml, vendor/manifest, vendor/vendor.json), I am confident it won't be a burden to read this one too, if we move to something new. And the tooling already rewrites go.mod in place when needed, so updating to a new format will be easy if that's what we decide. I was not attempting to lock this in place.
It's a mistake to create a private syntax for a configuration file when there are existing, perfectly fine formats available that are well understood and have publicly available parsers.
I obviously agree with this in principle. In practice I spent a while looking at all the existing formats and found them not "perfectly fine" for this job. In particular, look at how much shorter and clearer a go.mod is compared to the equivalent Gopkg.toml. I'm happy to return to this question once we're happy with all the other higher-level details.
And to answer @josharian's concern, if we keep the custom format then yes there would be public tooling, probably along the lines of x/vgo/vendor/cmd/go/internal/modfile.
I like the suggestions of @ericlagergren and @davecheney. It leverages the entirety of the go
compiler and its guarantees. But since go.mod
is good for detecting the package root, I have a couple of suggestions to keep that advantage while moving towards modules in the source code:
__Suggestion 1__
Have inline module information on main.go
for binaries and lib.go
for libraries.
Rust uses the main.rs
vs lib.rs
to differentiate binaries from libraries, and have a Cargo.toml
at the project root. The difference is that, on this suggestion, the module info would be using Go syntax inside a Go source file.
__Suggestion 2__
Have mod.go
for both binaries and libraries, then add vgo.Product = "binary" // or "library"
or some sort of const iota
instead of strings.
Swift has a Package.swift
, which is valid Swift code, at the project root, which specifies whether it is a binary or a library with the Package.products
type, which can be .library
or .executable
EDIT: Added comparison to suggestions above.
@huguesb
rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool...
Iām curious: why does āhierarchicalā imply ācomplex?ā
Stepping back, I probably misphrased that last requirement. I was looking for an intersection of the familiar and the extensible, and doing so with YAML and JSON on my mind. āHierarchicalā isnāt really the goal here, and Iām happy to scratch it off the list.
Iām surprised to see the reaction about it being complex, though, and I'm wondering if I'm missing something. When I look at the example mod.go
files like this one...
module "rsc.io/hello"
require (
"golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
"rsc.io/quote" v1.5.2
)
...personally, I see a āhierarchicalā data structure. By that, I mean a list of key-value pairs, where values can be primitives, lists, or other lists of key-value pairs. Changing nothing but formatting and punctuation, it becomes:
module: rsc.io/hello
require:
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2
ā¦And itās even 4 characters shorter! (Thatās a joke, if itās not obvious. :grin:)
When I jumped in here, I was thinking about extending an existing git repo dependency analyzer written in Node.js to recognize vgo modules. (Well, that, and how I missed the pretty colors my editor makes highlighting files...) Then I realized how much I didnāt want to create and maintain a custom parser, and how much easier it would be with a āstandardā data format.
By all means, put this question on the back burner. There are waaay more important things to figure out with vgo, and I like what Iām seeing so far! :+1:
Even if YAML is considered to be too complex or confusing, I would still prefer it (or JSON, or TOML or whatever other standard declarative format) over bespoke format, and support the _subset_ of it that we are happy with.
In other words, if go.mod
is a valid YAML/TOML/JSON (not necessarily supporting all features of these formats), it would make it immediately familiar to both users and any platform that you want to use for parsing.
@ecowden's example above makes it immediately clear to me which format I would prefer.
Another concern with go.mod
is that it doesn't even look declarative or standardized, it looks like imperative code. Is there any reason for that? Do we actually want to make it extensible and support imperative constructions there, e.g. functions?
@nilebox go.mod doesn't look more "imperative" or "declarative" than an nginx configuration file, for example.
Maybe put the go.mod as a comment on go file. For example:
/*
+require "golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
+require "rsc.io/quote" v1.5.2
*/
package main // import "rsc.io/hello"
Or
package main // import "rsc.io/hello"
import (
"golang.org/x/text" // require v0.0.0-20180208041248-4e4a3210bb54
"rsc.io/quote" // require v1.5.2
)
Too bad my OS(ubuntu) thinks the go.mod file is an audio file. This means I can't just double click and edit the file, I have to go through the hassle of letting my OS know that *.mod files should open in an editor.
You can, the file associations are fully user modifiable. However, using any well-established extension for the vgo module file is a rather unfortunate choice.
I think we should continue to use the very simple go.mod format, after the further simplification of making quotes optional (#24641). Once the dust settles, we should also publish a package like x/vgo/vendor/cmd/go/internal/modfile so that other tools can parse and edit mod files too.
As I wrote originally, I do understand the appeal of a standard file format, but I am still unable to find one that worked well for this task. My main concern is ease of editing, for both people and programs.
The files have to be easy for people to edit. For example, the hacked-up blog post system I built stores a JSON blob at the top of each file, above the post text, because it was very easy to implement that. But I am sick of needing to leave out the comma after the last key-value pair, because it makes adding a new key-value mean editing the previous one too. This is exactly why we allow trailing commas in Go literals. Those annoyances add up.
The files also have to be easy for programs to edit, without mangling it. Think about all the benefit weāve gotten from gofmt and tools being able to collaborate with people to work on Go source files. People and programs working together on go.mod will be similarly beneficial. In fact this is a key part of the design. If you read through the Tour of Versioned Go youāll see repeated alternation between the developer editing go.mod and vgo itself editing go.mod. That has to run very smoothly.
All the āgeneralized key-value pairā formats become awkward when thereās more than a single key-value pair to express. Itās true that we could use a YAML-like notation:
module: rsc.io/hello
require:
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2
but that nice one-line-at-a-time breaks when we get to replace "rsc.io/quote" v1.5.2 => "../quote"
. Perhaps the best encoding would be:
replace:
- rsc.io/quote: v1.5.2
with: ../quote
But then what does replace "rsc.io/quote" v1.5.2 => "github.com/you/quote" v0.0.0-myfork
encode as? Maybe this?
replace:
- rsc.io/quote: v1.5.2
with: github.com/you/quote
at: v0.0.0-myfork
The awkwardness here is not much, but itās still quite annoying: three lines instead of one, with corresponding reduced readability and ability to use line-based tools like grep, sort, diff.
The fundamental problem is that not everything a developer needs to say is best expressed as key-value pairs. We donāt use shells that require us to write:
cmd:
- prog: echo
- arg1: hello
- arg2: world
Yet somehow many developers accept this in config files. Why? Because, as Rob said, existing formats āare well understood and have publicly available parsers.ā At least, we think thatās true. The more I look at these formats the less convinced I become. And even assuming it's true, that benefit has to outweigh the disadvantages imposed by the format itself.
JSON is too picky (for example, about commas) and has no support for comments. Itās out.
XML is equally picky about closing tags and is too noisy in general. Itās out.
TOML and YAML are at least easier for people to edit, but they both have the general key-value problem.
Additionally, TOML requires quotes around both module paths as keys (because they have slashes) and all values ("rsc.io/quote" = "v1.5.2"
). Experience with go.mod suggests we want to move in the opposite direction, toward no quotes. (See #24641.)
Both TOML and YAML also turn out to be more complex than they first appear, a detail thatās very important if you need not just a parser but a mechanical editor that can parse, edit, and reprint the file. TOMLās complexity starts to show once you move away from key-value pairs: you have to learn the distinction between [x] and [[x]] and then start thinking about regular key-value pair lines versus inline tables. Of course, thatās nothing compared to YAML. Hereās an illuminating exercise: flip through http://yaml.org/spec/1.2/spec.pdf and try to find out what syntactic restrictions are placed on unquoted keys and values in key-value pairs. Iām still not completely sure. YAML embeds JSON as a subset but they didnāt stop there. As far as I can tell from the document, instead of writing:
module: rsc.io/hello
require:
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2
it appears to be equally valid to write:
%YAML 1.2
---
!!map {
? !!str "module" : !!str "rsc.io/hello"
? !!str "require"
: !!seq [
!!map { ? !!str "golang.org/x/text" : !!str "v0.0.0-20180208041248-4e4a3210bb54" },
!!map { ? !!str "rsc.io/quote" : !!str "v1.5.2" },
],
}
and it also appears the two forms can be blended arbitrarily. Something as simple as
module: !!str rsc.io/hello
appears to be valid YAML yet mean something different from what our āsubsetā parser would understand. There would be constant pressure to give up the insistence on using a subset of YAML, and yet it becomes more difficult to write a good mechanical editor (parse+edit+reprint) the more complexity is introduced.
If we had to pick some existing format, Iād pick TOML, but even that seems wrong:
module = "rsc.io/hello"
[require]
"golang.org/x/text" = "v0.0.0-20180208041248-4e4a3210bb54"
"rsc.io/quote" = "v1.5.2"
[[replace]]
"rsc.io/quote" = "v1.5.2"
with = "github.com/you/quote"
at = "v0.0.0-myfork"
The [[ ]] are necessary here because [require] is a single table (of key-value pairs each of which stands alone) while [[replace]] is an array of tables, in which each table is one replacement, with three keys: the path being replaced and the special keys āwithā and āatā. If you wanted to reserve any possible future expansion youād have to use [[require]] too, making it:
[[require]]
"golang.org/x/text" = "v0.0.0-20180208041248-4e4a3210bb54"
[[require]]
"rsc.io/quote" = "v1.5.2"
All in all, it doesnāt seem like these file formats are actually helping advance our goal of making the file easy for people and programs to edit. Weād probably have to write a custom parser+reprinter anyway, so the only real benefit would be syntax highlighting in editors. I think that benefit is easily outweighed by the awkwardness of shoehorning our semantics into these files in the first place. If your configuration is a few basic key-value pairs, they make a lot of sense. Ours is not just key-value pairs, so those files donāt make sense.
P.S. I wondered for a long time why it was that ādep ensure -addā did not modify existing constraints in Gopkg.toml. The answer is that Dep canāt reliably modify hand-written TOML, preserving comments and the like. Dep sometimes appends to Gopkg.toml but otherwise imposes the rule that Gopkg.toml is owned by people and Gopkg.lock is owned by programs. This seems to be an artifact of the available libraries as much as it is a design choice.
Based on (1) discussion with Rob, (2) no one replying to my last comment, and (3) the emoji counters on that comment, I'm going to close this issue and keep the bespoke syntax in go.mod (subject to further refinement like dropping quotes).
Most helpful comment
I think we should continue to use the very simple go.mod format, after the further simplification of making quotes optional (#24641). Once the dust settles, we should also publish a package like x/vgo/vendor/cmd/go/internal/modfile so that other tools can parse and edit mod files too.
As I wrote originally, I do understand the appeal of a standard file format, but I am still unable to find one that worked well for this task. My main concern is ease of editing, for both people and programs.
The files have to be easy for people to edit. For example, the hacked-up blog post system I built stores a JSON blob at the top of each file, above the post text, because it was very easy to implement that. But I am sick of needing to leave out the comma after the last key-value pair, because it makes adding a new key-value mean editing the previous one too. This is exactly why we allow trailing commas in Go literals. Those annoyances add up.
The files also have to be easy for programs to edit, without mangling it. Think about all the benefit weāve gotten from gofmt and tools being able to collaborate with people to work on Go source files. People and programs working together on go.mod will be similarly beneficial. In fact this is a key part of the design. If you read through the Tour of Versioned Go youāll see repeated alternation between the developer editing go.mod and vgo itself editing go.mod. That has to run very smoothly.
All the āgeneralized key-value pairā formats become awkward when thereās more than a single key-value pair to express. Itās true that we could use a YAML-like notation:
but that nice one-line-at-a-time breaks when we get to
replace "rsc.io/quote" v1.5.2 => "../quote"
. Perhaps the best encoding would be:But then what does
replace "rsc.io/quote" v1.5.2 => "github.com/you/quote" v0.0.0-myfork
encode as? Maybe this?The awkwardness here is not much, but itās still quite annoying: three lines instead of one, with corresponding reduced readability and ability to use line-based tools like grep, sort, diff.
The fundamental problem is that not everything a developer needs to say is best expressed as key-value pairs. We donāt use shells that require us to write:
Yet somehow many developers accept this in config files. Why? Because, as Rob said, existing formats āare well understood and have publicly available parsers.ā At least, we think thatās true. The more I look at these formats the less convinced I become. And even assuming it's true, that benefit has to outweigh the disadvantages imposed by the format itself.
JSON is too picky (for example, about commas) and has no support for comments. Itās out.
XML is equally picky about closing tags and is too noisy in general. Itās out.
TOML and YAML are at least easier for people to edit, but they both have the general key-value problem.
Additionally, TOML requires quotes around both module paths as keys (because they have slashes) and all values (
"rsc.io/quote" = "v1.5.2"
). Experience with go.mod suggests we want to move in the opposite direction, toward no quotes. (See #24641.)Both TOML and YAML also turn out to be more complex than they first appear, a detail thatās very important if you need not just a parser but a mechanical editor that can parse, edit, and reprint the file. TOMLās complexity starts to show once you move away from key-value pairs: you have to learn the distinction between [x] and [[x]] and then start thinking about regular key-value pair lines versus inline tables. Of course, thatās nothing compared to YAML. Hereās an illuminating exercise: flip through http://yaml.org/spec/1.2/spec.pdf and try to find out what syntactic restrictions are placed on unquoted keys and values in key-value pairs. Iām still not completely sure. YAML embeds JSON as a subset but they didnāt stop there. As far as I can tell from the document, instead of writing:
it appears to be equally valid to write:
and it also appears the two forms can be blended arbitrarily. Something as simple as
appears to be valid YAML yet mean something different from what our āsubsetā parser would understand. There would be constant pressure to give up the insistence on using a subset of YAML, and yet it becomes more difficult to write a good mechanical editor (parse+edit+reprint) the more complexity is introduced.
If we had to pick some existing format, Iād pick TOML, but even that seems wrong:
The [[ ]] are necessary here because [require] is a single table (of key-value pairs each of which stands alone) while [[replace]] is an array of tables, in which each table is one replacement, with three keys: the path being replaced and the special keys āwithā and āatā. If you wanted to reserve any possible future expansion youād have to use [[require]] too, making it:
All in all, it doesnāt seem like these file formats are actually helping advance our goal of making the file easy for people and programs to edit. Weād probably have to write a custom parser+reprinter anyway, so the only real benefit would be syntax highlighting in editors. I think that benefit is easily outweighed by the awkwardness of shoehorning our semantics into these files in the first place. If your configuration is a few basic key-value pairs, they make a lot of sense. Ours is not just key-value pairs, so those files donāt make sense.
P.S. I wondered for a long time why it was that ādep ensure -addā did not modify existing constraints in Gopkg.toml. The answer is that Dep canāt reliably modify hand-written TOML, preserving comments and the like. Dep sometimes appends to Gopkg.toml but otherwise imposes the rule that Gopkg.toml is owned by people and Gopkg.lock is owned by programs. This seems to be an artifact of the available libraries as much as it is a design choice.