Go: proposal: embed: remove support for embedding directives on local variables

Created on 16 Dec 2020  ·  7Comments  ·  Source: golang/go

@mdempsky points out in https://github.com/golang/go/issues/41191#issuecomment-742478978 that the late addition of string and []byte to the embedding proposal has some unfortunate side effects that we might want to make sure we are happy with.

There are three details of string and []byte embedding that are at least surprising. This issue is one of them. See #43217 for the other two.

The original design had only embedding in globals and only embed.FS. When I circulated the draft privately to a few people who had written embedding tooling, one person asked about local variables, and it seemed easy to add, so I did. (This was fine because the embed.FS was immutable, so it was really just a question of variable scope, not semantics.)

When we had the initial public discussions before making a formal proposal, many people asked for string and []byte. Those too seemed easy to add, so I did. But the two different additions interact poorly, because they created the intersection "function-local, mutable embedded data".

It seems like there are three options for how that would work:

(a) the []byte data is shared by all invocations of the function.
(b) the []byte data is freshly allocated (copied) on each call to the function.
(c) the []byte data is magically unwritable, causing a panic if written

Right now the behavior is (a), but I did not do that intentionally. It is difficult to explain to users, because it differs from the way every other local variable behaves. It seems like a clear bug.

The clearest alternative is (b): when the declaration statement in a function is executed, it is initialized with a fresh copy of the data instead of the actual data. That is less surprising than (a) but potentially very expensive. That's at least not a bug like (a) but certainly a performance surprise that would be good to avoid.

I listed (c) for completeness (mmap the data read-only) but it's not really on the table: we've considered the idea of that kind of data in the past in other, more compelling contexts and decided against it. This case is not nearly special enough to warrant breaking the data model by introducing "unwritable slices" into the core of the language.

That leads me to “(d) none of the above,” which I think is the right answer, not just for Go 1.16 but generally.

The string and []byte functionality seems important ergonomically. There are plenty of times when you just want to embed a single file as data, and having to go through the embed.FS machinery to get just a single string or []byte is a lot more work.

On the other hand, the function-local variable functionality seems much less important ergonomically. It's always easy to move the two lines (//go:embed and var declaration) up above the function.

So if these two can't coexist, the choice seems clear: string and []byte support is doing real work, while function-local variables are not. The underlying data is essentially a global anyway. Writing all embedded data as globals makes it very clear for []byte variables that there's only one instance of the data (and that there's no implicit copying either).

So that's what I suggest: remove support for embedding in local variables. It's easy to put the embedded variables just above the function that needs them, and then it's very clear that they are globals, there are no aliasing surprises, and so on.

What do people think? (Thumbs up / thumbs down is fine.)

Thank you!

Proposal

Most helpful comment

The other option would be to remove support for []byte, leaving only support for string.

That makes clear that it's an immutable value. And it's very easy to create the []byte if needed (just one line).

And it preserves the ability to use function-local embeds, which I think is a great property to preserve, given that these values are all immutable for the lifetime of the process.

All 7 comments

I'm torn.

I made a demo repo to play around with embedding. Here's a function to toggle between using an os.DirFS and an embed.FS at runtime:

func getFileSystem(useOS bool) http.FileSystem {
    var fsys fs.FS
    if useOS {
        log.Print("using live mode")
        fsys = os.DirFS("static")
    } else {
        log.Print("using embed mode")
        var (
            //go:embed static
            files embed.FS
            err   error
        )
        fsys, err = fs.Sub(files, "static")
        if err != nil {
            panic(err)
        }
    }
    return http.FS(fsys)
}

Having to move files out of the function would be less clear, and it would only be done because []byte requires it—string (embed.String?) and embed.FS are fine as is. So part of me thinks the restriction should only be on embed.Bytes. But it's odd to have a restriction on only one of the three types. To make it uniform you either need to have the bytes autocopy (weird) or apply the restriction to top level on all three. Then I think, well, embedding is a compile time concern, so it really makes the most sense at package level, even though it's inconvenient to not have it in a function.

In the end, I could be convinced otherwise, but a restriction is easier to loosen by go.mod directive in Go 1.16+ than it would be to add a restriction later, so I gave the 👍 .

One more thought. It's nice to be able to include version.txt in a file. But version.txt might have a trailing newline. So my version.go example in the demo repo is:

var (
    Version string = strings.TrimSpace(version)
    //go:embed version.txt
    version string
)

What would be really nice would be if you could write:

//go:embed version.txt
var Version string = strings.TrimSpace(_)

But it's not clear how something like that could actually work syntactically. An alternative that would work is:

var Version = func() string {
    //go:embed version.txt
    var version embed.String
    return strings.TrimSpace(version)
}()

Again, it would be a shame to lose the convenience of the function version just because []byte doesn't work, but OTOH, this version is the same number of lines and actually a bit less clear than the first version.

@carlmjohnson, thanks for the examples. The rewrites would of course be:

+ //go:embed static
+ var staticFiles embed.FS
+ 
  func getFileSystem(useOS bool) http.FileSystem {
    var fsys fs.FS
    if useOS {
        log.Print("using live mode")
        fsys = os.DirFS("static")
    } else {
        log.Print("using embed mode")
-       var (
-           //go:embed static
-           files embed.FS
-           err   error
-       )
-       fsys, err = fs.Sub(files, "static")
+       var err error
+       fsys, err = fs.Sub(staticFiles, "static")
        if err != nil {
            panic(err)
        }
    }
    return http.FS(fsys)
  }

- var Version = func() string {
-     //go:embed version.txt
-     var version embed.String
-     return strings.TrimSpace(version)
- }()
- 
+ //go:embed version.txt
+ var versionTxt string
+ var Version = strings.TrimSpace(versionTxt)

In both cases, the code does not seem noticeably worse to me, and in fact I think I like being able to see more clearly where the embedding uses are. "Clear is better than clever" and all that.

The other option would be to remove support for []byte, leaving only support for string.

That makes clear that it's an immutable value. And it's very easy to create the []byte if needed (just one line).

And it preserves the ability to use function-local embeds, which I think is a great property to preserve, given that these values are all immutable for the lifetime of the process.

At the risk of cluttering this issue with a lot of waffling, I am tentatively moving my :+1: to @ugorji's proposal to remove embed.Bytes.

Here's another bit of demo code:

var S = func() (s struct {
    Number   float64
    Weather  string
    Alphabet []string
}) {
    //go:embed value.gob
    var b []byte
    dec := gob.NewDecoder(bytes.NewReader(b))
    if err := dec.Decode(&S); err != nil {
        panic(err)
    }
    return
}()

The demo was supposed to illustrate that S is some struct that is difficult to construct, so you generate it somehow and write it out to a gob to read in on initialization.

When I wrote my demo code, I wondered if I should use []byte or string to construct my io.Reader. I decided on []byte for the sort of lame reason that the gob is binary data and while Go allows storing binary data in a string, it isn't allowed in other languages, so I may as well use a []byte. Taking a step back, the fact that embedding allowed either string or []byte forced me to make a basically pointless decision. There really aren't any good cases for using the mutability of embed.Bytes, so it basically just exists to save a few conversions, which the compiler will probably optimize anyway. That plus the fact that string can be in the RO data makes me think there's basically no reason to ever use embed.Bytes.

Here is what I should have done:

var S = func() (s struct {
    Number   float64
    Weather  string
    Alphabet []string
}) {
    //go:embed value.gob
    var es embed.String
    dec := gob.NewDecoder(strings.NewReader(es))
    if err := dec.Decode(&s); err != nil {
        panic(err)
    }
    return
}()

And here's what it looks like without function locals (IMO, this is less clear):

//go:embed value.gob
var b embed.Bytes

var S = func() (s struct {
    Number   float64
    Weather  string
    Alphabet []string
}) {
    dec := gob.NewDecoder(bytes.NewReader(b))
    if err := dec.Decode(&s); err != nil {
        panic(err)
    }
    return
}()

here really aren't any good cases for using the mutability of embed.Bytes, so it basically just exists to save a few conversions, which the compiler will probably optimize anyway.

I don't think the compiler can optimize them away, in the most common cases - like writing them to an io.Writer (like http.ResponseWriter). For that, the compiler would have to know the concrete type, to prove that the []byte is not retained in the Write call.

So, I absolutely think that using []byte is advantageous in most situations (the only use-case I have which would benefit from string is template.Parse). I also agree that it shouldn't be usable in a local var (for the reasons mentioned). And I feel that making anything but embed.Bytes available in local vars seems a bit inconsistent. But I'd be fine doing it that way.

That plus the fact that string can be in the RO data

FTR, I don't think there really is any advantage to this, in and off itself. It would only be an advantage for []byte, to prevent accidental mutation. But for string, it makes no real difference whether the memory is RO or not.

This seems like a tricky case. It's nice to be able to embed local variables to avoid polluting the package's namespace, but local []byte is confusing/error-prone. You can kind of solve this by disallowing embedded []byte, so you get local variables without the mutability issue, but then you lose the clarity of saying "this data is a slice of bytes explicitly". That is, it would be somewhat confusing to readers if I embedded an executable binary as a string, since its not meant to be read by humans.

So it seems like what we're doing is deciding if:

  1. avoiding package namespace pollution
    or
  2. clarifying that embedded data is explicitly a []byte and not a string

is more compelling. Personally my main use cases for //go:embed are embedding executables in my go binary (weird I know) and embedding templates. I think embedding templates is going to be far and away more common than embedding []byte, but I think []byte is going to be fairly common itself, and I don't care too much about being able to embed local variables, so my preference is that we disallow local variables, as in the original comment.

Was this page helpful?
0 / 5 - 0 ratings