This is a Go2 Proposal
Struct tags are one of Go's great hacks, but they are being exploited more and more to embed DSLs into Go that cannot be checked by the compiler.
Struct tags create implicit constructor-style functionality. Go does not require constructors for structs but we could use interfaces to mandate this functionality in specific scenarios (serialization/deserialization, etc).
The alternative is end-runs around Go's simplicity like the validator library, which injects a private DSL that has no compatibility guarantees into the running state of Go programs.
Also since struct tags effectively create the equivalent of eval
in Go code, they also represent an attack surface.
Obviously we can't remove struct tags from Go 1.x, so I'll throw this into the Go 2 bin for consideration later.
Ah, and you retitled it at the same time as my label change.
It would help to suggest some alternatives to struct tags for the things they are currently used for, such as the uses in encoding/json.
The functionality exists already with interfaces if we choose to use them more broadly. Removing struct tags does not require other language changes to Go in order to meet the functionality.
Any type that can be serialized or deserialized (not just json, but also yaml, toml, csv, etc) could be required to satisfy an interface available in the standard library. For example:
type Serializable interface {
Construct...
}
Hopefully the intent is clear. The standard library has a general lack of useful interfaces and this is but one example.
When it becomes very common to inject runtime DSL functionality into a program through a string, it seems fairly obvious that this is alerting us to a deficiency in the language design. We need to figure out how to bring this back into the core language.
@bradclawsie I don't follow how using an interface is a replacement for the specific functionality that struct tags provide for JSON, XML, etc. (specifying field names, omitempty
, ...)
@cespare Everything about serialization and deserialization can be moved to real Go functions, including any decisions about fields to omit. Struct tags are just an out-of-language shortcut that permits ad-hoc DSLs to be inserted into Go code.
currently, struct tags are of great help when you want to quickly do things like (un)marshal stuff, among other things. while there are negatives as you listed, relying on interface implementations would introduce quite a bit of strain to developers. imho, a better solution would be to replace them with proper annotations that the compiler can actually check (and maybe such annotations can be expanded to more things than merely struct tags).
E.g.:
type foo struct {
Bar string `json:",omitempty"`
}
would become:
type foo struct {
Bar string @json.OmitEmpty
}
At the very least, the compiler can at least see whether an annotation exists or not, and it can come with better documentation as well.
@urandom I really like your suggestion. The usability of the current approach is retained, but because the annotations you propose would actually be Go syntax, the compiler can inspect them (they are no longer ad-hoc DSL vectors with an attack surface).
There is also the potential for developers to be able to implement user-defined annotation support.
There would have to be a broader proposal to integrate annotations in general in to the language.
Thanks!
@bradclawsie I feel like what you're suggesting is that reflection should be removed from the language, since you want to implement reflection stuff through interfaces. Am I correct?
@faiface I am not asking for reflection to be removed from Go. I am proposing removing struct tags (whose contents are opaque and "not Go") and moving their functionality into the language itself.
@bradclawsie Ok, I see that you liked the idea of "tags with syntax". I'm not against that too. Getting back to your original proposal, I still can't see how you'd possibly replace struct tags with interfaces.
@faiface Tags create implicit functionality when serializing or deserializing. There's no reason why you couldn't just require the user to encode these in Go. You would enforce that with an interface. Go has this half-done in certain cases with Marshal
and Unmarshal
but doesn't formalize the requirement with an interface.
That said, @urandom has a better suggestion...it provides a similar functionality while preserving the conciseness of tags.
@bradclawsie The thing about moving this to interfaces is that it would remove majority of the convenience of reflection. Reflection is so cool in Go, because users don't have to do almost anything to use functionality which uses reflection. All I have to do to encode a struct is to create it (and maybe type a few tags). With this, I'd have to implement a few methods, well, a lot more code, a lot less convenient.
@faiface I agree my suggestion is less convenient, but Go already is focused on explicitness over cleverness everywhere else in the language. A more convenient alternative is @urandom 's annotations suggestion.
@bradclawsie Well, we have reflection in Go. I think it's one of it's greatest strengths. Removing tags altogether would make reflection a lot weaker. However I agree, that making them more integrated to the language makes sense.
@urandom 's suggestion is interesting but is incomplete. We must not require the compiler to actually know what @json.OmitEmpty
means. That does not scale, and would mean that new packages would not be able to take advantage of this technique. So json.OmitEmpty
must be defined in some way in the encoding/json package, so that when that package is imported the annotation is permitted in the importing package. How does that work?
Also since struct tags effectively create the equivalent of eval in Go code, they also represent an attack surface.
Can we see a practical example of how a struct tag increases attack surface?
@ianlancetaylor ; @urandom may have a better idea, but presumably there would have to be a new mechanism for user defined attributes. Otherwise it could be done with interfaces, but that carries with it the cost of boilerplate.
@as struct tag functionality is invoked automatically (on serialization/deserializing) and is hidden behind an opaque interface (the DSL of the tag text, which can be anything). The Go compiler will make no attempt to protect the developer.
To begin to address the potential for mayhem, you would need to specify the language of tag strings at a minimum.
Yes, but is there any basis to expect vulnerabilities in struct tags? They are not input fields from clients of the process, only the developer has control over their contents. Am I missing something?
We should expect and defend against malicious code constantly. The expectation is no different than that for regular Go libraries, but the ability to defend is absent: the format for struct tags is ad hoc
We should consider formalizing the definition of security before changing the language to achieve it. The proposal does not mention tangible security threats. Security is frequently a trade off for efficiency, simplicity, and ease-of-use. I am simply asking how an adversary can exploit struct tags to achieve something considered a security breach by some common criteria.
We're getting off track. I mentioned attack surface as an implication of tags, but the primary thrust of this proposal is to change the Go language so that the functionality embedded inside tags can be made part of the language itself. I would prefer we focus on that.
It is not my intention to hijack the proposal, however requirements​ preceeed design. What are the other advantages of removing tags if not security?
The main advantage is to make Go expressive enough to meet the needs of users within the specified language. Tags now provide a "back door" to expressing desired functionality outside of the specified language. In short, Go should not need to embed ad-hoc DSLs.
You seem to have a strong hate for DSLs, but that sounds more like a religious battle, I don't see compelling problems for programmers in your proposal.
An actual problem with struct tags is that they're parsed at runtime, so typos/bugs can go undetected or are harder to diagnose. I think a possible proposal for Go 2 would be to have a way for struct tags to be parsed at compile time. For instance:
pkg:key=value,key2=value2,key3,key4
(I personally don't feel like a new syntax like @pkg...
is necessary, but I wouldn't oppose it).json
to define the list of possible supported keys. I'm not sure what the syntax would be, for instance const OmitEmtpy structtag = iota // Documentation here
map[string][structtag]string
where the outer map is indexed by the package name, and the inner map by the tag name. This map can be made available for reflection (just like the string is in Go 1).pkg
doesn't map to a real package imported in the file scope.I don't have a particular problem with DSLs, but in the case of struct tags, they are unspecified. If Go really does need a macro/embedded DSL feature, it should be specified.
I don't have a particular problem with DSLs, but in the case of struct tags, they are unspecified. If Go really does need a macro/embedded DSL feature, it should be specified.
Meaning no one can ever use them for anything out of those "specs". Thanks, but no thanks.
@cznic if you don't think programming languages should have specs, why are you using Go at all? And yes, struct tags effectively hide programming languages
What make's you think a struct tag is a programming language?
The spec says struct tags are just strings, so they can contain ad-hoc instructions to be interpreted at runtime. There are no limits or size constraints. I could inject JS, Lua, or a language I make up myself.
Its one thing to provide this capability in a language...its another thing entirely to offer this as the only expedient way to "get things done" for serialization/deserialization. Go2 should try to fix this.
There are no limits or size constraints. I could inject JS, Lua, or a language I make up myself.
What does injecting mean? If it refers to typing a long string in a text editor, anyone can do that with their program because they have complete control over its execution as a developer. If you are referring to changing a tag's content at runtime, I don't think this is the appropriate use case for tags.
The things people are encoding in struct tag text are essential functions for the development of everyday Go code, so this functionality should be written in Go, not encoded in some private language.
Let me turn this around...why should something as common and simple as omitting empty JSON tags from serialization have to be done outside of Go as it is specified?
I understand (and somewhat agree with) the underlying motivation behind this proposal. But it isn't enough to just propose to get rid of struct tags without also proposing a way to replace their use cases.
The language doesn't know anything about JSON. So how do I tell the JSON library that field x of my struct shouldn't be encoded? How would you encode that in an interface, as you suggest?
Make the type of x "int32ButNotJSONEncoded", where that type has a special method? That seems pretty unscaleable. And any uses (sets & gets) of that field outside the JSON library need a bunch of casts that they otherwise don't care about.
Or do I write a custom JSON encoding method for every struct that I want to JSON enocde, for any struct for which any field needs special treatment?
I just don't see a better alternative that solves the problems that struct tags solve. If this proposal is to move forward, you'll need to propose something concrete.
@randall77 I agree that the proposal is incomplete and I hope to amend it with more constructive suggestions soon. I wanted to test the waters first to see what kind of reception I would get.
Just as an example of how it could be solved, look at C# attributes (I know Java has something similar but I'm not that familiar with it). Attributes allow you to attach annotations to module, class, method, property or variable. After that you can query them at runtime and act accordingly. For example, the following class using XML serializer
public class Car
{
[XmlElement("StockNumber")]
public string StockNumber { get; set; }
[XmlElement("Make")]
public string Make { get; set; }
[XmlElement("Model")]
public string Model { get; set; }
}
will be serialized into this
<Car>
<StockNumber>1020</StockNumber>
<Make>Nissan</Make>
<Model>Sentra</Model>
</Car>
You don't actually have to specify those attributes in C# because serializer will do the same by default. But you can if you want to tweak its behavior. Attributes are used everywhere - serialization, ORM, logging, compiler tweaking etc.
Attributes themselves are regular classes inherited from System.Attribute class. You can define any attribute you want. For example
public class UserAttribute : Attribute
{
public UserAttribute(string requiredParam)
{
RequiredParam = requiredParam
}
public string RequiredParam { get; }
public bool OptionalParam { get; set; }
}
will translate into the following attribute syntax
[UserAttribute("required param value", OptionalParam = true)]
The advantages are clear:
The downside is you have to import modules that define the attribute you want to use.
That's just an example of how the problem is solved in other language. I understand that C# is very different language and don't propose to copy the same feature to Go. Personally, I don't see any advantage to Go tags compared to C# attributes or something along those lines. The fact that you have to import modules was never a problem for me. I didn't even think about that until this very moment. But maybe I'm missing something which would make that a no go.
Attributes are an entirely new level of complexity and should not be compared to struct tags in Go. Struct tags are a convenience measure for marshaling data into wire protocol and file formats. Each field is allowed multiple tags for this purpose, which is why tags are not permitted in the struct declaration or any other data type other than a field. They are not attributes and should not be abused as such. If you are considering attributes as a solution to a problem in Go, then you are going about the problem the wrong way.
@garethwarry, Go tags are already used for everything I mentioned apart from compiler tweaking - database schema, serialization, logging. It's not a convenience but an essential feature for many libraries. You can also specify multiple attributes for each entity.
And I'm not proposing anything. This is just an example of solving the same problem in different language. Yes, the problem is the same and saying Go tags are something different downplays it's value for Golang. Tags are, essentially, attributes without any specification and verification. And that's the problem the proposal aims to solve. At least, that's how I see it and very much like to see it solved in Go 2.0, whatever it might be.
@garethwarry There is nothing in the Go spec that limits tags to the common uses you describe...that just happens to be where most people see them.
@creker, It doesn't need verification... a tag is simply a key-value pair of strings. It's not some class with a bunch of enumerations and overloaded methods for validating values. If you need such functionality, you can easily program that into your application in a way that doesn't mask complexity. I don't think it's Go's responsibility to spec something like that, either. It's up to the programmer.
@bradclawsie, You miss the point I was making about complexity.
@garethwarry the spec says tags are strings. There is nothing in the spec that says tags contain key-value pairs:
https//golang.org/ref/spec#Tag
@bradclawsie ...
"By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax."
@garethwarry
a tag is simply a key-value pair of strings
Yes and that's exactly the problem. Any typo or simply wrong tag will be detected only at runtime when your code doesn't work. And look again at C# attributes. They resemble the same key-value structure but compiler can actually catch errors. Values can be of any type and that provides safety. Complexity here actually solves a problem.
overloaded methods for validating values
Where did you get that? C# probably allows that (never done or saw that myself) but, again, that was just an example.
By convention
Tags are just happens to be key-value pairs. The spec doesn't limit them to it. Even that is not specified. Not to mention everything else. The spec actually permits any string in tags any string is permitted as a tag
@creker, _Do your own error validation._ Adding language complexity and reducing flexibility just for the sake of daft programmers needing validation done for them implicitly is a horrible idea. We don't need cookie-cutter solutions from Microsoft for C# programmers who don't know how to error check without try-catches and type validation everywhere. We need a language that allows real programmers to be expressive and get work done. Key-value string pairs is exactly what the languages needs for convenience. Again, if your application requires error validation for tags, then implement it. Otherwise, adding such an unnecessary feature just breaks compatibility and adds complexity. This is a "NO" vote from me.
Yes and that's exactly the problem. Any typo or simply wrong tag will be detected only at runtime when your code doesn't work.
If that's the only problem, then we could more easily address it by extending​ cmd/vet or some other linter to warn when tags don't follow convention.
@mdempsky, golint already does this. It recommends exactly what I quoted above. Any literal string that doesn't follow the key-value pair format is rejected.
Golint is a style recommender for Go1 that has nothing to do with a discussion on changing the Go specification for Go2. You keep bringing up "convention"...the point of this proposal is to change the spec, not merely a recommendation. Go's simplicity starts with the basic fact that you can write a Go compiler by reading the spec.
@bradclawsie, practice is just as important as syntax. When you fail to consider how good practice will come from syntax, then you fail to design a functional language. Look at Duff's device, for example. It compiles, but it's probably the worst use of language I've ever seen from C. Golint recommends good practice from Go, and should not be blown off as off-topic (as you've blown off multiple very good points being made). It recommends how tags are used - which is key-value just as the reflect package also documents.
I use golint
and think it has some value, but it has nothing to do with this discussion. This is a Go2 proposal and has nothing to do with third-party tools for Go1.
@bradclawsie,Your bloody topic is about struct tags, and you're saying we shouldn't look at best practices and golang documented uses for struct tags and should instead look at C# as a comparison? Are you high?
My proposal has nothing to do with Go1 best practices. It is about changing the Go2 spec. Please keep the discussion civil.
@mdempsky
If that's the only problem, then we could more easily address it by extending​ cmd/vet or some other linter to warn when tags don't follow convention.
And how does it solve the problem when there's a typo in key or value? How does it solve plain errors in syntax that library expects? No tool can check that. A good way to start with solving that would be to make key-value pairs actually be the spec and not some convention in a library and allow values to be of any type, not just strings. That way at least some verification could be done at compile time and would make tags easier to work with as they wouldn't require parsing at runtime.
@garethwarry, calm down. The proposed tools has nothing to do with the topic and don't solve the problem it highlights.
@garethwarry
Do your own error validation. Adding language complexity and reducing flexibility just for the sake of daft programmers needing validation done for them implicitly is a horrible idea. We don't need cookie-cutter solutions from Microsoft for C# programmers who don't know how to error check without try-catches and type validation everywhere. We need a language that allows real programmers to be expressive and get work done. Key-value string pairs is exactly what the languages needs for convenience. Again, if your application requires error validation for tags, then implement it. Otherwise, adding such an unnecessary feature just breaks compatibility and adds complexity. This is a "NO" vote from me.
Again, calm down. You're not helping with that kind of unconstructive rant. You could have just downvoted instead of this. You're still missing the point and now just insulting other programmers. If compile-time validation is a cookie-cutter solution, then we should just ditch Go altogether and return to interpreted languages where everything is checked at runtime. There's your expressiveness. But no, for some reason we are talking about very strict statically typed language. And proposal is about extending that strictness to struct tags.
The C# annotation example is hideous and a good example of the cruft Go is trying to avoid. Take note of the plan9-like dialstrings in net instead of the monotonous protocol functions in other languages: sometimes safety isn't as important as simplicity. There doesn't seem to be a single practical example of how struct tags can cause faults in a program. How many bugs are encountered as a result of mistyped tag strings in stdlib or other GitHub packages? Without these answers I would deem this a dubious feature to add to the language unless a significant number of users are burned by the unchecked nature of struct tags.
@as
The C# example annotations look the same as the current struct tags, expect that they can be applied to a lot of things, and the compiler can prove that they exist. You might not like that they are surrounded by '[]', but that's just the syntax they've chosen.
@urandom, I don't get where that came from.
And I completely agree with @as.
@as My original point in this proposal wasn't to address a rash of errors people are complaining about, but to have the Go language as it is commonly used to be fully specified. I understand people may have a good experience with tags as they are commonly employed, but I have yet to hear a compelling argument as to why they should be unspecified. I believe that most points brought up here concede the point that the functionality is useful...the issue is that the compiler doesn't care what is in these strings.
Note also that this is a Go2 proposal so I don't think moving people out of their comfort zone should be a concern...if Go2 ever becomes a reality (which I doubt), you should expect many more jarring changes than this.
In any case I believe the next step is concrete proposals as others have pointed out. I will work on that but I'm not in any rush...Go2 is not coming any time soon.
To summarize a few things that I've read:
As a conclusion: Struct tags can be abused, but mostly they are just convenient. ( as every language feature could be abused one way or another). Struct tags may not be pretty, but it seems to be not an easy task to make them prettier without losing their capabilities.
Whenever you parse dynamic content, you won't have compiler guarantees, no matter how explicit your code is. So does it worth it?
@mier85 Agree with most of your points but the value of the language spec is not to catch and reduce typos. It is to assist tool writers. Even if people tend not to mis-spell json strings, tags are still opaque to the compiler. Indeed by the current spec, a compiler could simply ignore tags entirely and still satisfy the Go1 compatibility guarantee, which by extension means that much of the behavior of serialization/deserialization is also not specified...
I know that was hyperbole, but I don't think the compiler could ignore tags entirely: that would break programs that use reflect.StructField.Tag
, which is explicitly referenced in the language spec.
@ianlancetaylor Right, I think what this is boiling down to is the adequacy of "By convention" in https://golang.org/pkg/reflect/#StructTag
So getting back to my original claim, a part of Go this important should be specified...and if not specified, removed and have the functionality provided by features in Go that are specified.
Frankly, requiring every type that you want to serialize into JSON to define an interface to do the serialization is a non-starter. That would require a ridiculous amount of duplicate code compared to how it works today. Go is, above all else, a pragmatic language. We are not going to remove struct tags unless we have something else that works better. I have not yet seen that better thing in this issue.
Agree, the answer is not to remove struct tags but to specify them.
I'd like to be able to strip/sanitize tags with a method rather than removing tags from the language. Simply because it's problematic to read from an api that requires tags for type conversion, and those tags are then carried over when writing the data to another api where the types are real.
@swizzley I don't understand what that would look like. Struct tags are a feature of a type defined at compile time, and there isn't any way to dynamically change a type.
This proposal is incomplete. Struct tags are useful and meaningful and pervasive in existing code. While their ad hoc nature is perhaps unfortunate, we can't get rid of them without some clear idea of what we are going to replace them with. Closing as incomplete.
Most helpful comment
currently, struct tags are of great help when you want to quickly do things like (un)marshal stuff, among other things. while there are negatives as you listed, relying on interface implementations would introduce quite a bit of strain to developers. imho, a better solution would be to replace them with proper annotations that the compiler can actually check (and maybe such annotations can be expanded to more things than merely struct tags).
E.g.:
would become:
At the very least, the compiler can at least see whether an annotation exists or not, and it can come with better documentation as well.