Go programs currently rely on the system to provide timezone information in the tzdata database, or rely on GOROOT
being accessible at run time in order to read $GOROOT/lib/timezoneinfo.zip
. This is not always true, particularly on Windows (#21881) but also in some other environments (#38013).
Therefore, it is desirable for Go program to have some mechanism to directly embed tzdata information into the program itself, so that they don't have to rely on system timezone information that may be absent.
I make the following assumptions, which I think are reasonable:
Given those assumptions, I propose adding a new package timezone/tzdata
. Importing this package (as import _ "time/tzdata"
) will cause timezone information to be embedded in the program. Also, building with the build tag timetzdata
will force the package to be imported, and therefore will cause timezone information to be embedded in the program. The embedded timezone information will be used if and when no system information is available for the timezone.
There is a sample implementation of this approach at https://golang.org/cl/224588.
Change https://golang.org/cl/224588 mentions this issue: time/tzdata: new package
I think this idea of supporting it as a compiler flag and as a package is great.
However, I'm wondering about whether the operating system tzdata should always be preferred if present as it may make an application unpredictable in cases where only using embedded data is highly preferred.
Would it make sense to make the time/tzdata work like certificate pools and root CAs work? Any app can embed its own set of root CAs by creating a cert pool and adding that cert pool to the http package's default transport. The Go stdlib could still have the tzdata package, and still hook up an embedded tzdata, but I think the way it gets hooked up in the code-imports-a-package version could be less magical than an anonymous import. We could require code explicitly setting something like time.DefaultLoadTzinfo = tzdata.LoadTzinfo
variable which would allow a developer to control exactly when or how the tzdata gets used and it communicates what's happening to a reader much clearer.
It would support both cases:
DefaultLoadTzinfo
with a function that tries calling the default DefaultLoadTzinfo
and if it it comes back with nothing, then calls tzdata's)DefaultLoadTzinfo
with tzdata's function)_Some of this comment was originally posted at https://github.com/golang/go/issues/21881#issuecomment-602387702._
Given those assumptions, I propose adding a new package
timezone/tzdata
. Importing this package (asimport _ "time/tzdata"
) will cause timezone information to be embedded in the program. Also, building with the build tagtimetzdata
will force the package to be imported, and therefore will cause timezone information to be embedded in the program.
I would support the latter (the build tag), but I'm not sure about offering the former (the _
imported package). I can foresee some time helper package that wants to "make sure" there's timezone information including this, and it ending up in people's dependency trees with no way to exclude it other than to pester the library author or fork.
The _
package is common to load dependencies like database drivers, but I think the binary size impact of those packages is lower than the 800K blob in comparison.
I really like the proposal. A small caveat about that last assumption,
the program should always prefer data from the system if available, as it is likely to be more up to date than that included in the program
Like @leighmcculloch 's comment, I would prefer to give priority to the embedded data always if it is available, so that I don't have deal with platform specific heisenbugs due to data differences or if the user has messed up their system timezone files.
But either way, this is a 1000% improvement on having to hack the Go runtime after every release.
I wonder if we should wait for more progress on https://github.com/golang/go/issues/35950, given that one could think of tzdata as an opt-in asset file. Either way, I agree that a solution to this issue would be very useful.
The kinds of problems that people describe with using system timezone data first are problems that people would see today, where we only use system timezone data. Have people actually seen, in practice, problems across different systems other than missing timezone information?
My concern with using embedded timezone data first is that it is guaranteed to be out of date within six months. While most programs are rebuilt within that time frame, there are certainly programs that are built and deployed once. Those programs may embed timezone information because they run in restricted environments, but they should still use updated timezone information as it becomes available. I have seen, in practice, problems with out of date timezone information, though it was with out of date timezone information on the system rather than in the program.
I don't particularly want to give programs a choice here. That leads to a more complicated API with benefits for few.
@zikaeroh I think we can address that issue with documentation.
It is difficult to make predictions. Especially about the future. ~ Danish proverb
I've reconsidered. I think it is fine as proposed.
My concern with using embedded timezone data first is that it is guaranteed to be out of date within six months.
That's a great point. 馃憦 I think this approach is good then for solving this problem.
Reflecting on when I would use this feature, I struggle to see myself using it. In most cases I make sure tzdata is installed and my operating system stays up-to-date as a part of general system health in the same way that root CAs need to be present. The very rare case I needed consistent tzdata led me to 4d63.com/tz but I no longer have that problem. The issue referenced in the PR description of how we get tzdata in Windows will be addressed some other way (#21881). I realize relying on the operating system is not the workflow everyone uses given #38013 and so for those use cases this seems good.
@ianlancetaylor
Over the years, I never needed time zone information in my program. I believe majority of Go users are in the same situation as me.
If this proposal is accepted, I am concerned that all my programs will become 800K larger. Same for many other users. I think some package developers will just add import _ "time/tzdata"
to one of their source files, and all executables that use that package will silently become 800K larger.
Go executables are large as they are - #6853 - and a lot of developers spend their time making executables smaller. Every 1 K counts. And here we are just wasting 800K of their efforts on something that users won't even use.
- adding the timezone information into a program should be optional, as it increases the size of the program by approximately 800K, and most programs do not need timezone information other than for the local system
How do you propose I will control if 800K is added to my executable or not?
The only reasonable solution I see here is to restrict import _ "time/tzdata"
to main
package only.
Thank you.
Alex
PS: Some runt. If this issue is to please users complaining about #21881, then someone should just spent some time and implement https://github.com/golang/go/issues/21881#issuecomment-554520916 or https://github.com/golang/go/issues/21881#issuecomment-342347609
The only reasonable solution I see here is to restrict
import _ "time/tzdata"
tomain
package only.
This could be an argument to make it a build tag only and leave out the anonymous import.
someone should just spent some time and implement https://github.com/golang/go/issues/21881#issuecomment-554520916
Maybe we could wait for https://github.com/golang/go/issues/21881#issuecomment-554520916 to be implemented, and then evaluate if the remaining use cases remaining warrant it?
How about support very light timezone data that doesn't contain the whole history of changes?
Something like: https://github.com/embeddedgo/x/tree/master/time/tz
@embeddedgo, if you don't know the rules for different years, you can't print a time from that year. Like if you didn't have the US history you wouldn't know to apply different rules for 1995 vs 2015.
Overall it sounds like people are in favor of this, and it would be a small change. Sure would be nice to have embedded file support. :-)
Adding to proposal minutes but this seems headed for likely accept.
Is tzdata the right name? I see that timezone.xml in Windows is only 400Kb (and it's a XML file, so it's got overhead). I think that we may want to switch the bundled database format in future, and "tzdata" is a specific database format. Maybe something like "time/zonedb" is both format-agnostic and also more clear to people not fully into timezone issues?
Adding to proposal minutes but this seems headed for likely accept.
@rsc what about my comment https://github.com/golang/go/issues/38017#issuecomment-603042727 ?
Thank you.
Alex
If one of my transitive dependencies imports time/tzdata
does it also mean that now my binary will become 800K larger?
If so, that is too much weight to be pulled in especially when done implicitly without my knowing.
I am especially interested in this topic because additional 800 KB can destroy my efforts to use Go in the niche of embedded systems.
The mainline 32-bit MCUs have at most 1 MB of Flash. Of this, I now have 400 KB for user code,
quite enough for many applications.
@alexbrainman @komuw @embeddedgo As I said above (https://github.com/golang/go/issues/38017#issuecomment-602767175), I think the issue of importing this package outside of the main package can be addressed with documentation. There is sample documentation in https://golang.org/cl/224588.
@rasky I don't have an opinion on whether the package should be called tzdata or zonedb. Perhaps people who have a preference could use emoji voting on @rasky's comment above (https://github.com/golang/go/issues/38017#issuecomment-604297573), thumbs up for zonedb, thumbs down for tzdata.
Is tzdata the right name? I see that timezone.xml in Windows is only 400Kb (and it's a XML file, so it's got overhead). I think that we may want to switch the bundled database format in future, and "tzdata" is a specific database format.
I think the name tzdata
is appropriate. Right now the package bundles that format, and it seems most clear for the package to indicate with its name what data it contains.
I did a Google search for tzdata and turned up the time zone database.
I searched for zonedb and turned up an unrelated DNS database.
time/tzdata seems like the right answer.
Except for the name, people seem to be in agreement about adding this
(again, with docs that make clear that it should only be imported in a package main).
This seems like a likely accept.
No change in consensus, so accepted.
Change https://golang.org/cl/228100 mentions this issue: time/tzdata: for test, permit extra elements in zone
Change https://golang.org/cl/228101 mentions this issue: time/tzdata: new package
Awesome! Would importing this package also solve https://github.com/golang/go/issues/24844?
@xtonyjiang No, it won't. The CL prefers the information on disk if it is available, as that information is typically more correct for the current system than the embedded information. The embedded information is a fallback to use when nothing else is available.
Hey Ian!
I have something to say on this topic that may be interesting to share!
- adding the timezone information into a program should be optional, as it increases the size of the program by approximately 800K, and most programs do not need timezone information other than for the local system
The reason why the time zone information on UNIX is so big is because zic flattens the information that is provided in the official tzdata files. The tzdata files define the time zones in a layered approach, but each file under /usr/share/zoneinfo is made to be independent, causing the layering to be lost. There is no reuse. This is intentional, because programs generally only load a single time zone file from disk.
At some point in the past I wanted to achieve the same thing (self-contained binaries that included all the time zones without blowing up the size), and discovered that it's possible to come up with solutions that are a lot more efficient. It is possible to come up with algorithms that don't require the data to be flattened, while still having very decent time complexity. I think that in the end I managed to store both the entire dataset and the algorithm in ~110 KB. Here's what I did:
zdump -v
to generate long lists of unit tests. The time zones used in these tests are the ones that I discovered were problematic in some sense (e.g., countries that moved across the international date line, British Double Summer Time).Unfortunately, hardly anyone uses this code. Maybe it was a waste of time for me to write it in the first place? Or maybe not, because now it may serve as an inspiration for someone to do something similar for Go?
@EdSchouten Thanks, nice idea.
Just started to read through the current version of the go 1.15 release notes and this one seems strange to me. Since tzdata becomes stale, why not have this data as a go module that can update independently of the standard library?
To me a golang.org/x/tzdata
go module that updates whenever the upstream tzdb changes feels a lot more fitting solution to the problem.
I guess it's way too late to change any of this now but I would like to understand the reasoning for doing it this way, the proposal does not go into this aspect in any detail so that is why I am asking.
@thomasf First, the point of time/tzdata is to modify the behavior of the standard library time package, so it requires a tight coupling to the standard library. It's hard to do that reliably and maintainably in an external module.
Second, one of the assumptions mentioned in the original proposal statement is: "it should be possible to embed timezone information when building an arbitrary program." The intent here is that a user of a Go program, not the original author, can take an existing program and deploy it to an environment without a timezone database and still have it work correctly. That is done by building the program with -tags timetzdata
. Any approach requiring an external module would inevitably be more complex and harder to use.
Thanks for the comment.
Maybe it would make sense to add a generic flag to go build
to request linking in additional libraries by import path? Could also be useful to link binaries with custom database drivers, etc.
Most helpful comment
Hey Ian!
I have something to say on this topic that may be interesting to share!
The reason why the time zone information on UNIX is so big is because zic flattens the information that is provided in the official tzdata files. The tzdata files define the time zones in a layered approach, but each file under /usr/share/zoneinfo is made to be independent, causing the layering to be lost. There is no reuse. This is intentional, because programs generally only load a single time zone file from disk.
At some point in the past I wanted to achieve the same thing (self-contained binaries that included all the time zones without blowing up the size), and discovered that it's possible to come up with solutions that are a lot more efficient. It is possible to come up with algorithms that don't require the data to be flattened, while still having very decent time complexity. I think that in the end I managed to store both the entire dataset and the algorithm in ~110 KB. Here's what I did:
zdump -v
to generate long lists of unit tests. The time zones used in these tests are the ones that I discovered were problematic in some sense (e.g., countries that moved across the international date line, British Double Summer Time).Unfortunately, hardly anyone uses this code. Maybe it was a waste of time for me to write it in the first place? Or maybe not, because now it may serve as an inspiration for someone to do something similar for Go?