Julia: Switch to a better lookup strategy for compile-time preferences in stacked environments before releasing 1.6?

Created on 29 Sep 2020  ·  23Comments  ·  Source: JuliaLang/julia

tl;dr: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of Manifest.toml files?

Continuing the discussion in https://github.com/JuliaLang/julia/pull/37595#discussion_r488976715, I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.

(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)

cc @fredrikekre @KristofferC

What is the motivation of compile-time preference?

Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.

I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU https://github.com/JuliaLang/Pkg.jl/pull/977. IIUC @timholy's ComputationalResources.jl achieves a similar effect with run-time @eval. FFTW's deps/build.jl uses a text file ~/.julia/prefs/FFTW to switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constant NANSAFE_MODE_ENABLED for adding debugging instructions.

I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (https://github.com/JuliaLang/Pkg.jl/issues/458 and https://github.com/JuliaLang/Juleps/issues/38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in Project.toml that is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.

Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences _and_ (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.

Strategies

There are various ways to lookup preferences of stacked environments (i.e., Base.load_path()). To start the conversation, I discuss following threee strategies:

Strategy 1: First package hit in Manifest.toml files (current implementation as of #37595)

The current strategy for finding the preference for a package is to walk through load_path() one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.

Strategy 2: First preference hit in Project.toml files

Search Project.toml files in load_path() and find the first Project.toml file with the preference of the target package.

Strategy 3: First package hit in Project.toml files

Search Project.toml files in load_path() and find the first Project.toml file with the target package.

Example

To illustrate the difference between these strategies, consider the following environment stack (i.e., Base.load_path() == [X, Y, Z])

  • Project X: Project.toml has package A which has package B as a dependency (i.e., B is in Manifest.toml but not in Project.toml). Package.toml has no compile-preferences table.
  • Project Y: Project.toml has the compile-preferences table for B. However, Project.toml's deps table does not contain B.
  • Project Z: Project.toml has the compile-preferences table for B. Project.toml includes B in deps; i.e., the user ran pkg> add B while activating Z.

Strategy 1 finds the preferences for B in X (i.e., empty). Strategy 2 finds the preferences for B in Y. Strategy 3 finds the preferences for B in Z.

To summarize:

| Project | deps | compile-preferences | Manifest.toml | found by |
| --- | --- | --- | --- | --- |
| X | [A, ...] | empty | has B as an indirect dependency | Strategy 1 |
| Y | [...] | has B's preferences | has B as an indirect dependency | Strategy 2 |
| Z | [B] | has B's preferences | has B | Strategy 3 |

Analysis

As I discussed in https://github.com/JuliaLang/julia/pull/37595#discussion_r490708451, I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package A depends on B is (usually) an implementation detail. Package A's author may silently drop B from the dependency when bumping v1.1 to v1.2. Then, after Pkg.update, Strategy 1 would pick up project Y as the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state of Manifest.toml since it is a large opaque file to the users and it is often not checked in to the version control system.

Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the compat entry. For example, the package can add the compat bound for the given preference support. The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package in Project.toml that they added just for configuring a transitive dependency.

Alternative: shallow-merge all preference tables?

It's also conceivable to aggressively combine preference tables for a given package using merge(dicts...). That is to say, given

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 1
b = 2

and

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 10
c = 30

we'd have merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2)) (i.e., Dict("a" => 1, "b" => 2, "c" => 30)).

Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences] # note `.preferences` suffix
a = 1
b = 2

and

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences]
a = 10
c = 30

As long as the specification is clearly documented, the package authors can use the appropriate behavior.

Opinion

I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.

Appendix: Current implementation

The entry point for the precompilation cache manager is get_preferences_hash

https://github.com/JuliaLang/julia/blob/6596f95226e59bc0bc34bda3ffd389f08fca7e24/base/loading.jl#L325-L348

https://github.com/JuliaLang/julia/blob/6596f95226e59bc0bc34bda3ffd389f08fca7e24/base/loading.jl#L1458-L1484

packages

Most helpful comment

Stefan and I had a mega-discussion about this, and I think we've come up with a somewhat revamped architecture. The pain points addressed are:

  • The need for a worry about compile-time vs. non-compile time preferences (e.g. "what if I change things from being a compile-time to a non-compile-time preference?")
  • The unwieldy linking of preferences with Project.toml (e.g. "what if I don't want to share these preferences with the world?")
  • The transitive issue (e.g. "What if Flux needs to set something in NNlib and NNlib isn't in the main Project?")

Compile-Time Detection

Instead of having to explicitly declare preferences as compile-time sensitive, we can instead flex our compiler muscles do this automatically. We will change the load_preferences() API from passing back a full dict to instead taking in a key and returning that value (with an optional default):

module Frobulator

function get_backend()
    # Use macro to automatically get calling Module
    # second argument is default value
    return @load_preference("backend", "CPU")
end

backend = get_backend()
if backend == "GPU"
    # do GPU stuff
else
    if backend != "CPU"
        @error("no such backend '$(backend)', defaulting to CPU!")
    end
    # do CPU stuff
end
end # module

The benefit of this setup is that we can now define load_preference() as follows:

currently_compiling() = ccall(:jl_generating_output, Cint, ()) != 0

function load_preferences(uuid::UUID, key::String; default = nothing, toml_cache::TOMLCache = TOMLCache())
    # Re-use definition in `base/loading.jl` so as to not repeat code.
    prefs_dict = Base.get_preferences(uuid, toml_cache)

    # If we're currently compiling, 
    if currently_compiling()
        push!(Base.COMPILETIME_PREFERENCES[uuid], key)
    end
    return get(prefs_dict, key, default)
end

The COMPILETIME_PREFERENCES is a Dict{UUID,Set{String}} that tracks accesses of all top-level preference keys during .ji generation, and at the end of the compilation run, it encodes not only the hash of all preferences, but the list of keys to include in the preferences. This allows us to easily invalidate while only paying attention to the things that we, by definition, need to.

Preferences.toml and LocalPreferences.toml

Since TOML parsing is fast and cheap these days, let's not store things in Preferences.toml at all, and instead just create a separate file. This separate file has a few advantages:

  • We don't have to muck up Project.toml in a way that we would possibly regret later
  • We can split things such that the decision to "export" your preferences to someone else can be solved at the .gitignore level.

Similar to Manifest.toml and (Julia)Artifacts.toml, we'll add a pair of twins to the TOML-gang, named (Julia)Preferences.toml and Local(Julia)Preferences.toml. These will be loaded (and merged) when you load your preferences, and can be targeted in save_preference(). Because we're merging, we'll need a "white-out" sentinel value, and to support future private dependencies, rather than storing things by UUID we'll store them by name (this also increases readability).

Concretely, you might have the following TOML files:

# package_root/LocalPreferences.toml

# Top-level Flux dependency
[Flux]
backend = "GPU"

# TBD syntax for setting a preference for a private dependency (e.g. Flux's private copy of NNlib)
[Flux-NNlib]
backend = "GPU"

# Setting another top-level dependency
[LoopVectorization]
LLVM_passes = ["pass3"]
__clear_keys__ = ["force_march"]

    [LoopVectorization.nested.baz]
    qux = "spoon"
# package_root/Preferences.toml
[Flux]
enable_experimental = false
# global_environment/LocalPreferences.toml
[LoopVectorization]
force_march = "avx2"
LLVM_passes = ["pass1", "pass2"]

    [LoopVectorization.nested]
    foo = "bar"

When loading preferences, we walk from most distant load_path() element to the closest, and within each location, we load (Julia)Preferences.toml first, then Local(Julia)Preferences.toml second, recursively merging as necessary. This results in the following resultant preferences:

# Flux gets merged together
load_preference(Flux, "backend") == "GPU"
load_preference(Flux, "enable_experimental") == false

# Pending future 'private dependencies' work, this will allow those private
# dependencies to have their own set of preferences.
load_preference(Flux.NNlib, "backend") == "GPU"

# Show that nothing is loaded for an actual top-level `NNlib` package.
load_preference(NNlib, "backend") == nothing

# LoopVectorization gets nicely merged, with the internal `__clear_keys__` sentinel having effect
load_preference(LoopVectorization, "force_march") == nothing
load_preference(LoopVectorization, "LLVM_passes") == ["pass3"]
load_preference(LoopVectorization, "nested") == Dict("foo" => "bar", "baz => Dict("qux" => "spoon"))

That's a lot to take in, so let's stop for a quick breather and absorb it. The reasoning behind having two TOML files is that one may provide shared settings for a group project, while the LocalPreferences.toml is intended to be .gitignore'd and used only on the local machine, etc...

Preferences are Explicit User Choices, Not Communication Channels

To circle back to the question that this issue was originally opened to address; we just won't allow setting preferences for things that don't exist in Project.toml. If something doesn't exist, we'll just auto-add it to the extras target and get the mapping into the Project.toml for you. This both allows us to resolve names to UUIDs in the project, as well as have unambiguous stopping behavior for the loading search. As private dependencies are still but a twinkle in Stefan's eye, we're punting on a few of the details surrounding them, but the basic idea would be that if Flux attempts to set a preference in NNlib, today this would cause NNlib to get listed as an extra in the top-level Project.toml, whereas if it were a private dependency of Flux it would instead not get added and the mapping would go under the key Flux-NNlib rather than just NNlib.

The reason for the title of this section is that I have come around more to the idea that Preferences should, as much as possible, be something that the user could write out by hand with no API help. Flicking a switch in Flux should not require Flux to set a bunch of weird options inside NNlib. We don't want to encourage this channel of communication because it has the potential to turn into a mess where we're esssentially subverting the resolver and building incompatible flavors of packages

If we view preferences as altering the package itself, we are essentially designing a method of building multiple different "flavors" of the same package. Arpack[BLAS=MKL] is a slightly different package than Arpack[BLAS=OpenBLAS], and two different packages that depend on Arpack may themselves require different options. This leads us to our own "diamond of death" scenario:

    Top-level project
         / \
        /   \
       /     \
      /       \
  |-----|  |-----|
  | Foo |  | Bar |
  |-----|  |-----|
      \      /
(MKL)  \    / (OpenBLAS)
        \  /
         \/
     |--------|
     | Arpack |
     |--------|

Nobody wants this, and a true solution requires a global mediator, such as the Pkg resolver, to be able to determine a valid, cohesive structure. So will avoid this by strongly encouraging package authors to not break this compartmentalization guideline, but still allowing it if need be. An example is Arpack and ArpackMKL; it's a mess if we have to keep them separate, but there are real reasons they aren't 100% compatible with upstream packages. One solution is to not let packages set transitive dependencies at all, and simply tell the user to do it, but that's kind of annoying and will cause copy-paste fatigue (e.g. "now paste this into your REPL: Preferences.set_preference(Arpack, "BLAS", "MKL")). On the other hand, we definitely don't want to be in a situation where we set Foo to be on "fast" mode (Which causes it to tell Arpack to use MKL, then we tell Bar to use "Opensource-only", which causes it to tell Arpack to use OpenBLAS. Race conditions for the fail.

Instead, we propose a compromise; we suggest to package authors to not use this as a channel of communication at all, but if you do, use set_preference(Arpack, "BLAS", blaslib; force=false). This will throw an error if a non-matching key is already set for Arpack, and it will force the user to get involved.

Coda

There's a lot here, so feel free to start discussing bits and pieces of it, perhaps even opening issues for specific thoughts/questions on the repo proper. In terms of code loading changes, I'll need to coerce code loading to save/load the list of compile time preferences, but that's the only change that needs to go into Base julia.

All 23 comments

Just reading through this and thinking about it a little bit I agree that strategy 3 seems best. You shouldn't really use the content of the manifest for anything, except determining what package to load.

One thing I'm not convinced about this; let's imagine that I have a package Foo that depends on Bar, and Foo.setup() calls Bar.set_backend("GPU"). (An example of where this might occur is that maybe you have a giant deep learning library like Flux that has a bunch of subordinate packages like NNlib, and you want Flux to configure everything to use a certain kind of backend).

In the current design (strategy 1), because Bar appears in the current environment's manifest, when we look for where to write our preferences out to, we'll select the current environment, write them out there, and load them from there.

In strategy 2, I'm not sure where we would write them out to, (perhaps the global environment?), and then we should continue to read them back out from there.

In strategy 3, I'm also not sure where we would write them out to, and assuming the package Bar is not explicitly added anywhere, we may never find the preferences at all, so we'd get into a situation where we set preferences and then when we read them back, we don't get anything.

It's because of this situation (transitive dependencies) that I think we need to pay attention to the Manifest.

Deciding which Project.toml to be modified is a difficult problem on its own (see below). But I think always using the activated project is a good default. Also, since Preferences.jl (that takes care of the API/UI manipulating the preferences) is not a stdlib, the decision we make right now in Preferences.jl has less impact.

In Strategy 2, I think adding the preference for Bar in the activated project is very straightforward. As the code loader does not care about deps or Manifest.toml, julia would pick it up "always" (but see below).

In Strategy 3, I think we can just emit a helpful error message that the user has to run Pkg.add("Bar") on the activated project if Bar is not in its deps. Or Preferences.jl could provide an option auto_add = true or something to make Pkg.add automatic.


One difficult situation (for any strategies including Strategy 1) is when you have something like LOAD_PATH == [SOME_ABSPATH, "@", "@v#.#", "@stdlib"]. Should Preferences.jl add the preference to "@"? Or SOME_ABSPATH? I think both options are reasonable. But, if we use the activated project (as currently Preferences.jl mentions it in README), I think we should warn the user when the activated project is not LOAD_PATH[1].

Personally, I think it will be a bit of a "paradigm" shift if packages themselves start putting a bunch of options into the project file for packages that are not even in the Project file. Is that how the feature is indended to be used. Personally, I thought that options would always come directly from a user (same as how package entries in [deps] come directly from a user doing a Pkg.add).

Personally, I thought that options would always come directly from a user (same as how package entries in [deps] come directly from a user doing a Pkg.add).

Even if we do our best to adhere to this, (which I'm not sure all packages will; for instance, I would not be surprised if packages such as FFTW decide to store their build configuration here, auto-choosing a good default for the user) there will always be the case where configuring one project will end up configuring others, because choosing "GPU enabled" for Flux should automatically turn on "GPU enabled" for NNlib, for instance.

In Strategy 3, I think we can just emit a helpful error message that the user has to run Pkg.add("Bar") on the activated project if Bar is not in its deps. Or Preferences.jl could provide an option auto_add = true or something to make Pkg.add automatic.

I'm kind of split; on the one hand, yes, this would work. But on the other, it's kind of just papering over the fact that code loading works based off of Manifests, not Projects. In other words, when I type using Foo, that is governed by which manifest I'm using, not which Project. I am still concerned that with strategy 3 someone will start julia with a project that has a newer version of some transitive project installed, and that newer version will break because when it looks for its preferences, it finds something in the global environment for much older version of itself.

Stefan and I had a mega-discussion about this, and I think we've come up with a somewhat revamped architecture. The pain points addressed are:

  • The need for a worry about compile-time vs. non-compile time preferences (e.g. "what if I change things from being a compile-time to a non-compile-time preference?")
  • The unwieldy linking of preferences with Project.toml (e.g. "what if I don't want to share these preferences with the world?")
  • The transitive issue (e.g. "What if Flux needs to set something in NNlib and NNlib isn't in the main Project?")

Compile-Time Detection

Instead of having to explicitly declare preferences as compile-time sensitive, we can instead flex our compiler muscles do this automatically. We will change the load_preferences() API from passing back a full dict to instead taking in a key and returning that value (with an optional default):

module Frobulator

function get_backend()
    # Use macro to automatically get calling Module
    # second argument is default value
    return @load_preference("backend", "CPU")
end

backend = get_backend()
if backend == "GPU"
    # do GPU stuff
else
    if backend != "CPU"
        @error("no such backend '$(backend)', defaulting to CPU!")
    end
    # do CPU stuff
end
end # module

The benefit of this setup is that we can now define load_preference() as follows:

currently_compiling() = ccall(:jl_generating_output, Cint, ()) != 0

function load_preferences(uuid::UUID, key::String; default = nothing, toml_cache::TOMLCache = TOMLCache())
    # Re-use definition in `base/loading.jl` so as to not repeat code.
    prefs_dict = Base.get_preferences(uuid, toml_cache)

    # If we're currently compiling, 
    if currently_compiling()
        push!(Base.COMPILETIME_PREFERENCES[uuid], key)
    end
    return get(prefs_dict, key, default)
end

The COMPILETIME_PREFERENCES is a Dict{UUID,Set{String}} that tracks accesses of all top-level preference keys during .ji generation, and at the end of the compilation run, it encodes not only the hash of all preferences, but the list of keys to include in the preferences. This allows us to easily invalidate while only paying attention to the things that we, by definition, need to.

Preferences.toml and LocalPreferences.toml

Since TOML parsing is fast and cheap these days, let's not store things in Preferences.toml at all, and instead just create a separate file. This separate file has a few advantages:

  • We don't have to muck up Project.toml in a way that we would possibly regret later
  • We can split things such that the decision to "export" your preferences to someone else can be solved at the .gitignore level.

Similar to Manifest.toml and (Julia)Artifacts.toml, we'll add a pair of twins to the TOML-gang, named (Julia)Preferences.toml and Local(Julia)Preferences.toml. These will be loaded (and merged) when you load your preferences, and can be targeted in save_preference(). Because we're merging, we'll need a "white-out" sentinel value, and to support future private dependencies, rather than storing things by UUID we'll store them by name (this also increases readability).

Concretely, you might have the following TOML files:

# package_root/LocalPreferences.toml

# Top-level Flux dependency
[Flux]
backend = "GPU"

# TBD syntax for setting a preference for a private dependency (e.g. Flux's private copy of NNlib)
[Flux-NNlib]
backend = "GPU"

# Setting another top-level dependency
[LoopVectorization]
LLVM_passes = ["pass3"]
__clear_keys__ = ["force_march"]

    [LoopVectorization.nested.baz]
    qux = "spoon"
# package_root/Preferences.toml
[Flux]
enable_experimental = false
# global_environment/LocalPreferences.toml
[LoopVectorization]
force_march = "avx2"
LLVM_passes = ["pass1", "pass2"]

    [LoopVectorization.nested]
    foo = "bar"

When loading preferences, we walk from most distant load_path() element to the closest, and within each location, we load (Julia)Preferences.toml first, then Local(Julia)Preferences.toml second, recursively merging as necessary. This results in the following resultant preferences:

# Flux gets merged together
load_preference(Flux, "backend") == "GPU"
load_preference(Flux, "enable_experimental") == false

# Pending future 'private dependencies' work, this will allow those private
# dependencies to have their own set of preferences.
load_preference(Flux.NNlib, "backend") == "GPU"

# Show that nothing is loaded for an actual top-level `NNlib` package.
load_preference(NNlib, "backend") == nothing

# LoopVectorization gets nicely merged, with the internal `__clear_keys__` sentinel having effect
load_preference(LoopVectorization, "force_march") == nothing
load_preference(LoopVectorization, "LLVM_passes") == ["pass3"]
load_preference(LoopVectorization, "nested") == Dict("foo" => "bar", "baz => Dict("qux" => "spoon"))

That's a lot to take in, so let's stop for a quick breather and absorb it. The reasoning behind having two TOML files is that one may provide shared settings for a group project, while the LocalPreferences.toml is intended to be .gitignore'd and used only on the local machine, etc...

Preferences are Explicit User Choices, Not Communication Channels

To circle back to the question that this issue was originally opened to address; we just won't allow setting preferences for things that don't exist in Project.toml. If something doesn't exist, we'll just auto-add it to the extras target and get the mapping into the Project.toml for you. This both allows us to resolve names to UUIDs in the project, as well as have unambiguous stopping behavior for the loading search. As private dependencies are still but a twinkle in Stefan's eye, we're punting on a few of the details surrounding them, but the basic idea would be that if Flux attempts to set a preference in NNlib, today this would cause NNlib to get listed as an extra in the top-level Project.toml, whereas if it were a private dependency of Flux it would instead not get added and the mapping would go under the key Flux-NNlib rather than just NNlib.

The reason for the title of this section is that I have come around more to the idea that Preferences should, as much as possible, be something that the user could write out by hand with no API help. Flicking a switch in Flux should not require Flux to set a bunch of weird options inside NNlib. We don't want to encourage this channel of communication because it has the potential to turn into a mess where we're esssentially subverting the resolver and building incompatible flavors of packages

If we view preferences as altering the package itself, we are essentially designing a method of building multiple different "flavors" of the same package. Arpack[BLAS=MKL] is a slightly different package than Arpack[BLAS=OpenBLAS], and two different packages that depend on Arpack may themselves require different options. This leads us to our own "diamond of death" scenario:

    Top-level project
         / \
        /   \
       /     \
      /       \
  |-----|  |-----|
  | Foo |  | Bar |
  |-----|  |-----|
      \      /
(MKL)  \    / (OpenBLAS)
        \  /
         \/
     |--------|
     | Arpack |
     |--------|

Nobody wants this, and a true solution requires a global mediator, such as the Pkg resolver, to be able to determine a valid, cohesive structure. So will avoid this by strongly encouraging package authors to not break this compartmentalization guideline, but still allowing it if need be. An example is Arpack and ArpackMKL; it's a mess if we have to keep them separate, but there are real reasons they aren't 100% compatible with upstream packages. One solution is to not let packages set transitive dependencies at all, and simply tell the user to do it, but that's kind of annoying and will cause copy-paste fatigue (e.g. "now paste this into your REPL: Preferences.set_preference(Arpack, "BLAS", "MKL")). On the other hand, we definitely don't want to be in a situation where we set Foo to be on "fast" mode (Which causes it to tell Arpack to use MKL, then we tell Bar to use "Opensource-only", which causes it to tell Arpack to use OpenBLAS. Race conditions for the fail.

Instead, we propose a compromise; we suggest to package authors to not use this as a channel of communication at all, but if you do, use set_preference(Arpack, "BLAS", blaslib; force=false). This will throw an error if a non-matching key is already set for Arpack, and it will force the user to get involved.

Coda

There's a lot here, so feel free to start discussing bits and pieces of it, perhaps even opening issues for specific thoughts/questions on the repo proper. In terms of code loading changes, I'll need to coerce code loading to save/load the list of compile time preferences, but that's the only change that needs to go into Base julia.

Well laid out, @staticfloat! I apologize for not having gotten into this with you earlier and given the feedback before you'd already merged one version of it. The major advantage of this iteration of the design is that neither the consumer of preferences, nor the dictator of preferences needs to be worried about whether they are compile-time or run-time preferences. If the consumer of a preference accesses it during precompilation, then it will automatically invalidate .ji files. Elegant!

Another benefit is readability/writability of the preferences. Even if people don't usually produce these files by hand, they will read them and having headers with names like [Flux] and [LoopVectorization] makes it a lot easier to understand what's going on than if headers look like [46d6ce02-1188-4fba-af91-f93edbf1bcf5].

Preferences are Explicit User Choices, Not Communication Channels

Yes!

To circle back to the question that this issue was originally opened to address; we just won't allow setting preferences for things that don't exist in Project.toml.

Yes!

The reason for the title of this section is that I have come around more to the idea that Preferences should, as much as possible, be something that the user could write out by hand with no API help.

Yes!

All of these things read very nicely to me and are very much in line of the high level view I had of how preferences would work.

@staticfloat Thanks for the detailed analysis and superb suggestion! I agree with all the aspects and I'm glad that such a drastic change is within the scope.

Preferences.toml and LocalPreferences.toml

In principle, we _could_ merge Preferences.toml to Project.toml, right? While I don't have a particularly strong opinion on this, some people don't like cluttering top-level directory with various config/dotfiles. Including the shareable preferences in Project.toml let us keep Julia project repositories "cleaner" (in terms of the number of top-level files).

recursively merging as necessary

I wonder recursively merging everything could be too aggressive. Why not shallow merge (i.e., load_preference(LoopVectorization, "nested") == Dict("baz" => Dict("qux" => "spoon")))? It's easy to emulate recursive merge using long key with prefixes like nested_baz_qux. Or maybe recurse only when, for example, the first letter of the key is uppercased (A nice property with this is that it's natural to map nested dictionaries to submodules).

I totally agree recursive merge would be useful for many cases but it could be harmful if there is no way to suppress it. If we use recursive merge always unconditionally, we can't have a dictionary with some mutually exclusive patterns. For example, suppose we have

[Foo.parallel]
backend = "sequential"
[Foo.parallel]
backend = "threads"
ncpu = 4

Merging them yields

[Foo.parallel]
backend = "sequential"
ncpu = 4

which may be invalid. "Ignore invalid options" is somewhat a valid strategy but I think it's reasonable to support validation of the preferences (for a better experience of end-users).

Another example is

[WebFramework.logging]
target = "all"
# level = "INFO"  # default
[WebFramework.logging]
target = ["MyPkg"]
level = "DEBUG"

Merging them yields

[WebFramework.logging]
target = "all"
level = "DEBUG"

Let's say this means to enable debug-level logging for everything. It may not be what the user wants.

Notice the __clear_keys__ example above; that clears any inherited settings for the listed keys.

I mean that the package author cannot declare that some sub-dictionaries have mutually exclusive patterns.

Edit: I guess it can be done at save-time.

Yeah, I think any kind of verification should be done within the package.

I'm not married to the idea of recursive merging though. Shallow merging would be much, much simpler to implement. It's also much more complicated for the programmer to do things like load their preferences then save them back; how do you know what pieces were inherited from a higher load_path() location and what was explicitly stated in the current Preferences.toml?

Well, I'm pro-always-recursive-merge now :)

how do you know what pieces were inherited from a higher load_path() location and what was explicitly stated in the current Preferences.toml

I think the behavior of packages should be, in principle, agnostic to the source of preference information. But I think it's reasonable to later add some API for querying the source of information so that you can create a better error/warning message for invalid input (e.g., pointing out file(s) where the invalid parameter is set).

So imagine the following:

prefs = @load_preferences()
prefs["foo"] = prefs["foo"] + 1
@save_preferences(prefs)

If @load_preferences() does recursive merging, then when we @save_preferences(), we will encode all the inherited preferences as well. If we do shallow merging, we can limit the API to:

pref = @load_preferences("foo")
@save_preference("foo", pref + 1)

By limiting all accesses to be scoped to a top-level key, we ensure that we only ever get a consistent view.

In principle, we _could_ merge Preferences.toml to Project.toml, right? While I don't have a particularly strong opinion on this, some people don't like cluttering top-level directory with various config/dotfiles. Including the shareable preferences in Project.toml let us keep Julia project repositories "cleaner" (in terms of the number of top-level files).

This is a good point. We could use [preferences] section in Project.toml and Preferences.toml.

Notice the __clear_keys__ example above; that clears any inherited settings for the listed keys.

I would call this __clear__ and all the value to either be true in which case it clears all inherited keys or an array of strings which are key names to clear (i.e. reset to default).

Or maybe recurse only when, for example, the first letter of the key is uppercased (A nice property with this is that it's natural to map nested dictionaries to submodules).

Clever idea. Maybe too clever, but very clever...

Shallow merging would be much, much simpler to implement.

This would be really hard in a language without recursion, but fortunately we have recursion.

@staticfloat Why not reflect the load API with key to the save API and add (say) save_preferences(module, key1 => value1, key2 => value2, ...)? Or maybe set_preferences is a better name for this. I'm imagining something like

save_preferences(  # or set_preferences
    LoopVectorization,
    "LLVM_passes" => ["pass1", "pass2"],
    "nested.baz.qux" => "spoon",
)

would change the preference from

# @/Preferences.toml (State 1)
[LoopVectorization]
force_march = "avx2"
LLVM_passes = ["pass3"]

    [LoopVectorization.nested]
    foo = "bar"

to

# @/Preferences.toml (State 2)
[LoopVectorization]
force_march = "avx2"                     # untouched
LLVM_passes = ["pass1", "pass2"]

    [LoopVectorization.nested]
    foo = "bar"                          # untouched

    [LoopVectorization.nested.baz]
    qux = "spoon"

while

save_preferences(
    LoopVectorization,
    Dict(
        "LLVM_passes" => ["pass1", "pass2"],
        "nested" =>  Dict("baz" => Dict("qux" => "spoon")),
    ),
)

would change @/Preferences.toml at "State 1" to

# @/Preferences.toml (State 3)
[LoopVectorization]
LLVM_passes = ["pass1", "pass2"]

    [LoopVectorization.nested.baz]
    qux = "spoon"

Hmmm, interesting. So we'd basically avoid a save() API that takes in a Dict entirely, and instead provide a more flexible setting API.

I like it! I'm going to completely drop the modify_preferences!() API as well. I'm going to implement this, and we can kick its tires a bit.

We could use [preferences] section in Project.toml and Preferences.toml.

Thinking a bit more about this, my only concern would be if people decide to save preferences in Preferences.toml instead of a [preferences] section in Project.toml and someone wants to add some local preferences. In that case, there's nowhere to put them except further up the load path. With LocalPreferences.toml it would be perverse to commit this file so that's not an issue. We could tell people that if they want to set shared preferences they should always just use Project.toml. Does this local/shared preferences business need some user-level API? Maybe this is all a bit too complex.

Just for clarity, when I suggested merging Preferences.toml to Project.toml (i.e., [preferences] section in Project.toml), I still meant to keep LocalPreferences.toml. Though I guess the asymmetry of the file name and schema (i.e., preferences.LoopVectorization.LLVM_passes in Project.toml becomes LoopVectorization.LLVM_passes in LocalPreferences.toml) is a bit ugly.

Does this local/shared preferences business need some user-level API? Maybe this is all a bit too complex.

So if we are to use LocalPreferences.toml, I guess we need a way for the end-users to specify the destination (Preferences.toml/Project.toml vs LocalPreferences.toml). A "low-tech" but reliable approach could be to ask the package author to pass around keyword arguments:

# in LoopVectorization
set_passes(passes; kwargs...) =
    save_preferences!(
        @__MODULE__,
        "LLVM_passes" => validate_passes(passes);
        kwargs...
    )

so that the end-user can do LoopVectorization.set_passes(["pass1"]; islocal=true).

Another approach could be to use context variables (i.e., "dynamically scoped" variables) #35833 to control the destination. The end-users can then call

julia> Preferences.with_local() do
           LoopVectorization.set_passes(["pass1"])
       end

(maybe with a better syntax sugar). I guess there are gazillion of other ways.

Okay so I just needed to get something that worked so that we could play around with it so I made some unilateral design decisions. The Julia branch is here and the package is here.

I decided to go with:

  • Storage in Project.toml and LocalPreferences.toml; it seemed too confusing to me to have _three_ locations for preference lookup within a single load path node, so I decided to restrict it to just two, while keeping the clearly-named LocalPreferences.toml around.

  • Deep merging. Even though it was more work to implement and will doubtless cause much confusion amongst package developers (especially with __clear__ entries) it's still pretty neat, so I just rolled with it.

  • No . indexing syntax. There's no real reason here, except that it's not an absolutely-required feature, so if one of you wants to implement it, that's fine by me.

  • The decision of where to store the preferences can be completely manual, (e.g. providing a direct path to a .toml file), or automatic (e.g. if the kwarg export_prefs is set to true, find the first Project.toml that contains this dependency. If none exists, use the active project and add it to extras. If export_prefs is false, do the same, but store it into LocalPreferences.toml next do that Project.toml).

The best documentation right now is the docstring for set_preferences!(). It's got a decent number of tests, but not quite everything is tested, so there still might be some bugs.

I'm on vacation now, so I won't be pushing this forward for the next 10 days or so. Feel free to mess around with it and come up with better ergonomics than I have. :)

Is this addressed by #38044?

Yes.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wilburtownsend picture wilburtownsend  ·  3Comments

manor picture manor  ·  3Comments

m-j-w picture m-j-w  ·  3Comments

i-apellaniz picture i-apellaniz  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments