Applications and packages frequently depend on (small) binary files. For example, XLSX.jl uses template Excel files. Other applications may include a small dataset, a logo image, etc. These can conveniently be maintained in the package repository. However, currently the only easy method to use these is to refer to them using @__DIR__ , see here for an example in XLSX.jl. This issue is that this breaks relocatability.
Would it be possible to expand the artifact mechanism to refer to binary files as an artifact, thus eliminating the need to use @__DIR__? Please note that the location of the file is not of interest in this case, only the content of the file. Besides eliminating the usage of @__DIR__, a further benefit is that PackageCompiler will automatically include the file into the executable.
Hosting files like these in a web server is not very attractive IMO, because (i) it requires a package developer to find a hosting server, just for that small excel template file (ii) this cannot be versioned along with the code in the repo.
Perhaps the entry in the artifact file could be something like this:
[blank_xlsx]
git-tree-sha1 = "43563e7631a7eafae1f9f8d9d332e3de44ad7239"
[[blank_xlsx.get]]
relpath = "data/blank.xlsx"
sha256 = "e65d2f13f2085f2c279830e863292312a72930fee5ba3c792b14c33ce5c5cc58"
I am not at all an expert in artifact files, so this can probably be improved! I hope that including this is easier than supporting libraries, which need to be built separately for different target platforms.
I think I recently addressed a similar question here: https://discourse.julialang.org/t/a-couple-of-questions-about-artifacts/33367/4 so I'm linking to that, as much of what I have to say is the same.
First off, a few design decisions that you may not be aware of and that will help inform some of the discussion:
Artifacts are containers, not files. This simplifies talking about moving them around; I don't have to specify, for instance, the local name of the object when downloading it from a remote resource; I get the thing, then unpacking it specifies everything I need to know about how to lay it out on disk. You can think of Pkg.Artifacts as a way to serialize, transfer, and unserialize a tree reliably.
Artifacts are content-addressed, and as such are explicitly decoupled from their location on disk. Artifacts can and will be loaded from any depot currently on the depot path; we don't care about where we're getting them from, as wherever we find it, we are guaranteed that the bits on-disk are correct. Because of this, it's totally valid for some future version of Julia to completely change how it searches for artifacts and where they're stored. It's not like packages where you can specify a "dev" directory and the code will be loaded from a path you give to Julia; we want Artifacts to be a little more of an abstraction than that, because it's a fairly restricted set of functionality.
Artifacts are built for write-once, read-many patterns. This lines up well with what you're describing above, but I thought I would mention it anyway; nothing in the tree of files that the artifact contains can change after it has been created. The contents are "frozen" and anything that you might want to change would have to be placed into an entirely different artifact that has no relationship to the first.
Alright, now that the philosophizing is out of the way, let's address the current proposal. I see three separate questions in your post:
Yes, artifacts are portable in things like PackageCompilerX because they perform their path lookup at runtime, whereas macros like @__DIR__() will bake in their path at compile-time. There's nothing magical about Artifacts here, you can generate a portable path to a file in your package by doing a similar lookup that Artifacts do:
using Pkg
# This will be re-initialized at __init__() time.
pkg_dir = @__DIR__
function __init__()
# Get the current manifest, look up our own package by UUID
env = Pkg.Types.Context().env
pkg_uuid = Pkg.Types.UUID("12aac903-9f7c-5d81-afc2-d9565ea332ae")
entry = env.manifest[pkg_uuid]
# Convert the PkgEntry into a PackageSpec
spec = Pkg.Types.PackageSpec(
name=entry.name,
uuid=pkg_uuid,
version=entry.version,
path=entry.path,
repo=entry.repo,
tree_hash=entry.tree_hash,
)
# Ask `Pkg` to find our source path
global pkg_dir = Pkg.Operations.source_path(spec)
end
Let me know if that doesn't work (I haven't tested it with things like PackageCompiler), but I believe it should, as the current environment manifest should always point to the proper path, I believe.
How can we reduce friction of developers publishing artifacts?
This is a very good point. Ideally, we would have a Pkg.publish(artifact_hash) that allows you to effortlessly upload the artifact to some service, whether it be a GitHub release, or JuliaTeam, or your own S3 account or whatever. There are plenty of free/cheap, stable hosting platforms (GitHub releases is my favorite) but as it stands, there are a few steps you have to jump through:
Pkg.Artifacts.archive_artifact() to create a tarballPkg.Artifacts.bind_artifact() with the appropriate download info to write out the entry into an Artifacts.toml file)I expect us to improve this eventually, but probably not in the near future (although we're always happy to review and merge pull requests!)
How can we build artifacts of files generated by local packages?
Packages can certainly generate artifacts, and PackageCompiler should bundle them alongside the application when it creates one. That is to say, if your package has an artifact bound within its Artifacts.toml file, it will get bundled along with the application. Therefore, it's quite easy for a package to generate some data upon first load, and if that artifact exists, it will get bundled along with the application via PackageCompiler. Example:
function create_artifacts()
# This is the path to the Artifacts.toml we will manipulate
artifact_toml = joinpath(@__DIR__, "Artifacts.toml")
# Query the `Artifacts.toml` file for the hash bound to the name "iris"
# (returns `nothing` if no such binding exists)
iris_hash = artifact_hash("iris", artifact_toml)
if iris_hash == nothing || !artifact_exists(iris_hash)
# create_artifact() returns the content-hash of the artifact directory once we're finished creating it
iris_hash = create_artifact() do artifact_dir
# We create the artifact by simply downloading a few files into the new artifact directory
iris_url_base = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris"
download("$(iris_url_base)/iris.data", joinpath(artifact_dir, "iris.csv"))
download("$(iris_url_base)/bezdekIris.data", joinpath(artifact_dir, "bezdekIris.csv"))
download("$(iris_url_base)/iris.names", joinpath(artifact_dir, "iris.names"))
end
# Now bind that hash within our `Artifacts.toml`. `force = true` means that if it already exists,
# just overwrite with the new content-hash. Unless the source files change, we do not expect
# the content hash to change, so this should not cause unnecessary version control churn.
bind_artifact!(artifact_toml, "iris", iris_hash)
end
end
function __init__()
create_artifacts()
end
You could of course just do a cp() instead of a download() when creating the artifact, however this approach clearly makes more sense if you're somehow _generating_ the artifacts rather than just copying them, because if you already have the files on disk, I think it makes more sense to just use the files directly, as suggested at the beginning of this enormous post. :)
you can generate a portable path to a file in your package by doing a similar lookup that Artifacts do:
I don't think that will work. After running PackageCompiler there are no source files, no project files no artifact files etc. There is only the sysimage and the unpacked artifacts. The reason Julia can find the artifact files is because:
__init__).JULIA_DEPOT_PATH before initializing the Julia runtime.JULIA_DEPOT_PATH.For this to work with PackageCompiler we need to establish:
An example implementation could be:
Package/assets are bundled when compiling.assets folder along bin and artifactsjoinpath(Sys.BINDIR, "..", "assets") otherwise to joinpath(pkgdir(@__MODULE__), "assets").The last part doesn't feel super clean.
Ah, I didn鈥檛 realize that Package Compiler doesn鈥檛 ship the source code at all. Indeed, then the only two options are to create artifacts or to modify Package Compiler as you suggest.
Thanks @staticfloat and @KristofferC for your detailed answers! I wonder what would be the best solution. If artifacts are not the answer, perhaps we should move the issue to PackageCompiler?
Relocatable apps can be major benefit of Julia compared to Python/Matlab. Therefore, I hope that this can be addressed in such a way that most Julia code (and perhaps _all_ packages) stop using @__DIR__, e.g. by making it harder to find (not export from Base) and put a big warning in the docs. What do you think? If you agree I can open an issue for Julia in general to this effect, but perhaps there should be a solid replacement in place first.
I feel like the simpler piece here is to have an easy way to load data relative to source code. Yes, that doesn't address the PackageCompiler part, but perhaps it could if we made PackageCompiler aware of the mechanism so that it could convert that into an artifact during compilation?
Also, I wonder how breaking it would be if @__DIR__ were expanded at run-time...
I feel like the simpler piece here is to have an easy way to load data relative to source code.
What's hard about that right now?
How do you do it?
Just use @__DIR__? That's what everyone does now and it seems to work fine, the problem is when you don't have the file so "relative to file" makes no sense.
@__DIR__ doesn't work when you're relocating files even when the files are there (e.g. if you just copy a whole depot from one computer to another) because the file path is ingrained at compile-time.
But won't that invalidate the precompile file anyway? Maybe not?
It doesn't work fine because it's not relocatable.
@KristofferC Thanks for your ideas about a possible implementation!
An example implementation could be:
- All files in
Package/assetsare bundled when compiling.- They get put into a separate
assetsfolder alongbinandartifacts- PackageCompiler sets a flag telling the package if it is running in compiled mode. If it is you resolve the asset path to
joinpath(Sys.BINDIR, "..", "assets")otherwise tojoinpath(pkgdir(@__MODULE__), "assets").The last part doesn't feel super clean.
I think it would be better if the application does not have to figure out whether or not it is running in compiled mode, but rather that the final solution always "just works". That would be the attraction of using the artifact mechanism for it...
Hi all - I've made a small package - https://github.com/nielsls/Assets - that does what @KristofferC suggested.
It allows you to retrieve an asset using asset"my_image.gif". As Kristoffer suggested:
package/assetsassets folder along bin and artifacts (currently a manual step...)."""
Returns true if called from a standard Julia session.
False if called from an app created using PackageCompiler.
"""
function is_standard_julia_session()
julia_folder = splitpath(Sys.BINDIR)[end-1]
return match(r"Julia.\d+\.\d+\.\d+", julia_folder) !== nothing
end
The last part is tested on Windows only and should probably be improved - but it does "just works".
Note: There's no dependence on Artifacts or content hashes here. A file is identified purely by its filename (with all the pros and cons that follows).
Let me know what you think - thx
Cool, thanks! I like the usage of the usage of the string macro to get asset"my_image.gif. Perhaps there is a possibility to leverage the code offered by @staticfloat to turn a local file into an artifact. The benefit of that would be that PackageCompiler will automatically ship them!
Because I think there is still a good case for the functionality proposed in this issue, I will leave it open for now - perhaps it is an inspiration to the devs. Unfortunately, I currently don't have the opportunity to contribute a solution myself.
Very cool!
Most helpful comment
Hi all - I've made a small package - https://github.com/nielsls/Assets - that does what @KristofferC suggested.
It allows you to retrieve an asset using
asset"my_image.gif". As Kristoffer suggested:package/assetsassetsfolder alongbinandartifacts(currently a manual step...).The last part is tested on Windows only and should probably be improved - but it does "just works".
Note: There's no dependence on Artifacts or content hashes here. A file is identified purely by its filename (with all the pros and cons that follows).
Let me know what you think - thx