I think it would open up some powerful possibilities if packages in a "Project.toml" were only identified by their UUID. So in e.g.
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
"JSON" would just become an arbitrary alias, resp. the local name used for "682c06a0-de6a-54ab-a142-c8b1cf79cde6" within the package/project with that "Project.toml".
This would enable us to rename packages at will without breaking code referring to those packages under their old names.
For example, let's assume (I'm not involved in either package, and this is pure hypothetical), the maintainers of "JSON" and "JSON3" agreed that "JSON3" (I guess it's faster?) should become the new default JSON-package. So we they could rename "JSON" to "JSONOld" and rename "JSON3" to "JSON", but keep the UUIDs. Any other package that still refers to "JSONOld" as
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
would just continue to work, while packages that want to use the newer JSON package (formerly "JSON3" could use
JSON = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
This would also enable us to tidy up the package namespace a bit (as long as the package maintainers agree) - I guess we have quite a few packages that, for historical reasons, have names that are either to general or don't fit the package very well anymore (e.g. after it has evolved and broadened or narrowed it's scope).
This would also resolve name-clashes between packages from different registries (e.g. between general and a private registry).
In principle, fully UUID-based package resolution would also allow for having multiple packages registered with the same name within the same registry - IMHO this should definitely be avoided though, at least in general, as it would create lot's of confusion. A name should be "freed" before another package can claim it.
CC @StefanKarpinski (we discussed this in Baltimore, but I forgot to write it up as an issue.)
I'm not able to judge if this would require major changes of if it could be implemented fairly easily.
Everything of this already happens, so trust me that we are already taking uuids very seriously. :) The only reason that it is not possible to rename a package right now is because of code loading does not lookup the uuid of the package that got its named changed in the project file of the packages that uses the old name. But this could be added.
Everything of this already happens, so trust me that we are already taking uuids very seriously. :)
I wasn't really implying we didn't - I just couldn't resist the allusion to a certain other issue. :-) But you're right, of course - I changed the title of this issue accordingly. ;-)
What I meant if that we currently can't use
[deps]
Foo = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
instead of
[deps]
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
so currently both the name and the UUID matter (maybe not the technically correct way of putting it, I fear). I guess that's what you mean with "because of code loading does not lookup the uuid of the package that got its named changed in the project file of the packages that uses the old name."?
But this could be added.
If this could be added without too much trouble, I think it would be worth it (and yes, I know, a PR might be welcome - but I wouldn't quite know where to begin ;-) ).
As you might imagine I have thought more than a bit about this. I'm currently working through what needs to change to support renames. However, I don't think code loading needs to change so much as we need to use the internal name of a package at a given commit (i.e. what the Project.toml file at that commit says the name is) as the name when installing it. Then as long as the code using a package agrees with the internal name of the package at the commit that's being used, everything will work fine. There's a bunch of changes to Pkg required to support that to make it so that we install packages at a location based on the internal name of the package at a version instead of the registered name, but it's not too bad.
If you want to be able to use a package by an arbitrary name, that's a very different situation. It would require completely changing the path scheme for installed packages. Currently the installed path of a package is something like $depot/packages/$name/$slug
where name
is the name of the package (currently the registered name, but in what I propose above it would be the internal name of that version, i.e. matching the name in $depot/packages/$name/$slug/Project.toml
). If the name when you're using a package is completely arbitrary, then how do you find the package? You can't determine the name part. What you're saying could be done if we did pure content addressing of packages and stored installed packages at a path like $depot/packages/$hash
where hash
is the full git-tree-sha1 of the package source. But then the installed package directory becomes very unfriendly to users: it gives no indication of the package's name and all versions of all packages are installed in a single directory.
This would also resolve name-clashes between packages from different registries (e.g. between general and a private registry).
I think this part was nonsense, though, two packages with same name but separate UUIDs can already handled right?
two packages with same name but separate UUIDs can already handled right?
Yes, that works fine already.
We could potentially keep a file mapping (uuid, git-tree-sha1) pairs to names and then look up package locations that way, but it complicates (and changes) the process of loading code, which I'm generally very reluctant to do. Code loading would ignore the arbitrary name used for a package inside the project and manifest files and look up the (uuid, git-tree-sha1) pair to get the actual name of a version of a package and then find it based on that. Not the worst complication ever, but still.
As you might imagine I have thought more than a bit about this.
Oh, sure - sorry, I didn't mean to badger, I just thought maybe I should write this "package names as aliases" idea down.
If you want to be able to use a package by an arbitrary name, that's a very different situation.
Yes, that was my (maybe somewhat naive) hope. Currently, when I make a new package, I worry a lot about the name, because it's a bit hard to change later. And sometimes, packages just develop in a direction not forseen in be beginning. But of course I didn't think about
It would require completely changing the path scheme for installed packages.
the above. Darn. :-)
I guess $depot/packages/$uuid/...
or $depot/packages/$name-$uuid/...
wouldn't solve it either, right?
Anything with the name in it has the same problem. Using just the UUID would work but $depot/packages/$uuid/$hash
creates a very long path name which can be a problem for some systems/tools. That's why we use the five character slug instead of something that long.
However, I don't think code loading needs to change so much as we need to use the internal name of a package at a given commit (i.e. what the Project.toml file at that commit says the name is) as the name when installing it. Then as long as the code using a package agrees with the internal name of the package at the commit that's being used, everything will work fine.
This would mean that a package using the renamed package would be stuck with using an old verison, until it changes to use the new name, right? Hm, if we had a way to notify users of the package that is has been renamed - do we? That would solve it, I guess.
Using just the UUID would work but $depot/packages/$uuid/$hash creates a very long path name which can be a problem for some systems/tools. That's why we use the five character slug instead of something that long.
Again, probably naive - we can't turn the UUID into a slug instead of the name, right? It certainly would be a bit less, ah, unique?
The slug is based on the uuid and tree hash. The name in the path is important though because sqrt(62^5) ≈ 30267
so when there are about 30k slugs in the same place there's a 50% chance of a collision. When the slugs are in a directory of version of packages with the same name, that's a pretty good situation, if they're all in the same top-level packages directory, that's not so good.
I think the most viable approach is this:
This file would have a format something like this:
682c06a0-de6a-54ab-a142-c8b1cf79cde6 = ["JSON"]
0f8b85d8-7281-11e9-16c2-39a750bddbf1 = ["JSON", "JSON3"]
This last line is assuming that JSON3
has been renamed to JSON
at some point so that there are earlier versions of the same package with the internal name JSON3
and later versions with the internal name JSON
. This file would live at $depot/packages/Names.toml
for each depot.
The process of code loading would go like this:
$depot/packages/Names.toml
for uuid
name
found, look for $depot/packages/$name/$slug
All that said, I have to wonder if it isn't better to just require the name by which one uses a version of a package to match its internal name.
In any case, making sure that package versions are installed at a path that's based on their internal name is something that we should do in any case, so I'm going to keep working on that.
Can't we pull a git
and just break up the UUID/hash into parts? This would avoid having to look up the name at all and avoid any problems with overly long directory names.
We could but I still think that $uuid_slug/$hash_slug
is a bit of a user-unfriendly path scheme. Since a package always has an internal name we might as well use it to make paths friendlier.
I would prefer a simpler mapping over a friendlier path. (Why does the path have to be user-friendly in the first place?)
Because source paths show up in stack traces all the time and not being able to tell what package the code is in is kind of a big usability issue.
Argh, yes, didn't consider that.
Looks like this is indeed quite a bit more tricky than I had hoped - I thought the difficulty might be closer to the compiler level or so - but from what I understand, that's actually not a really problem at all. Instead it's "mundane" path names :-) But Stefan's arguments are very hard to fault, of course. Darn, I had hoped it would be some easy change - I guess that was too naive, because in that case it probably would have been done already.
In addition to problems with stack traces, etc., changing package paths would also kinda break break package-management compatibility between Julia 1.x versions, I guess - which would be really inconvenient, it's really nice to be able to quickly switch versions with a common package repo.
Because source paths show up in stack traces all the time and not being able to tell what package the code is in is kind of a big usability issue.
Would it be crazy to just insert package name when printing the stack traces, instead of a full path, something like this?
Stacktrace:
[1] f(...) at $PACKAGE_NAME src/file.jl:51
...
The files in Base
are already not printed using the full path.
The direction of this issue sounds great as it would allow (but not force) package authors to use Go-like migration path; i.e., differentiate namespace when bumping the major version.
Yes, that's a great idea but I still think that having the package name in the path is still a good idea.
I agree - I do find myself looking into the packages directory manually from time to time, human readable path names are very nice to have. Also, breaking the current scheme would probably not be possible withing the Julia v1.x track, right? And even if it could be considered, breaking the current option of Julia verisons sharing packages would be kinda inconvenient.
So I guess Stefan's suggestion of keeping a separate map between package names and UUIDs would currently be the only practical way to have package names that are truly local to the using Project.toml? I certainly do understand the reluctance of complicating the package loading process with such an additional mapping, though. And even with such a map, I guess we would have cases where we have to different packages installed in the same directory, only with different (not human readable) slugs? Not very transparent, I have to admit.
Darn, this would be so nice to have - but I can't think of a really clean and elegant solution either (though I'm hardy an expert here).
Also, breaking the current scheme would probably not be possible withing the Julia v1.x track, right?
It depends. Is it acceptable to make everyone reinstantiate their manifests? If so then it's fine.
I'm actually pretty ok with going ahead with my suggestion here. It's a pretty simple scheme and doesn't complicate code loading all that much. I'm going to give it a try. You can automatically generate the Names.toml
file from an existing set of installed packages, which is a nice property.
Yay, thanks!
Just throwing this out there, since it wasn't mentioned on this thread: would symlinks be an acceptable solution for a fully UUID-based package resolution scheme, while preserving human-readable package paths? So there could be, e.g.
$depot/packages/$name/
-> $depot/packages-uuid/$uuid/
or something like that. This is similar to the strategy used by GoboLinux, and to some extent by Homebrew.
That wouldn't work on Windows, unfortunately.
I wanted to record a possible approach that came up in a discussion just now with @staticfloat: we could use the "local name" of packages in the project file but the "canonical name" in the manifest file. That means that the manifest stanza for a package would always use its canonical name. To find top-level X
, you would look up X
in the project file, find its UUID, then scan through the manifest file for that UUID, which gives the canonical name, and then look up the package at $depot/packages/$canonical_name/$slug
. If you're looking up a non-top-level X
then you start in the manifest, find the stanza for the current package, look in its deps
, which would be keyed by local names and would only be permitted to use the name list form if (a) all deps use their canonical names and (b) all those canonical names are unique in the manifest. Thus we can look up the local name in the deps entry and be assured that either the name is canonical and unqiue in the manifest, or the deps is a map to UUIDs and we can then proceed with looking up the dependency by UUID, which gives us its canonical name again, allowing us to find the code. This approach doesn't require keeping an alias map, which is considerably cleaner than what I'd proposed before.
Thus we can look up the local name in the deps entry and be assured that either the name is canonical
I don't see how this is possible if the non top-level dependency uses a non canonical name for what it itself loads. Unless you go to that package's project file it feels impossible to know what UUID that resolves to. I think that is what I mentioned here https://github.com/JuliaLang/julia/issues/33047#issuecomment-524448242.
Yes, we'd have to look at the project file for each bit of code we're loading, which we don't currently have to do. But I think that's ok to do. Can't think of a reason not to—we're loading code from that location anyway.
But then you can no longer figure out the full dependency graph from just the Project / Manifest without downloading all packages in it. It's a bit similar to why the deps
, and version
entries are stored in the Manifest, no? We could theoretically just read them from the package Project file when loading.
Most helpful comment
It depends. Is it acceptable to make everyone reinstantiate their manifests? If so then it's fine.
I'm actually pretty ok with going ahead with my suggestion here. It's a pretty simple scheme and doesn't complicate code loading all that much. I'm going to give it a try. You can automatically generate the
Names.toml
file from an existing set of installed packages, which is a nice property.