This is a proposal for adding first class support in Pkg and the code loading system for conditional dependencies.
Desribing a conditional dependency is easiest with an example. A typical concrete example is for a plotting package to add support for plotting e.g. DataFrames (by adding some method plot(::DataFrame)) but not require a user to install DataFrames to use the plotting package. The plotting package wants to run a bit of extra code (the part that defines the method) when the conditional dependency DataFrames are somehow "available" to the user. The extra code that the package executes when the conditional dependency is active is called "glue code".
The way people implement conditional dependencies right now is by using Requires.jl. It works by registering a callback that evaluates some code with the package loading code in Base. The callback gets executed when the conditional dependency is loaded (by e.g. comparing UUID), the code from the callback is evaluated into the module and the functionality for the conditional dependency is provided.
As an example usage:
using Requries
function __init__()
@require DataFrames="c91e804a-d5a3-530f-b6f0-dfbca275c004" plot(df::DataFrame) = ---
end
There are a few reasons why the current strategy using Requires.jl to deal with this is unsatisfactory.
include-ing some file when the conditional dependnecy is available. Requires.jl runs inside __init__ which means the code evaluated by the include command does not end up in the precompile file.One declares a conditional dependency by adding an entry to the Project.toml file as:
[conditional-deps]
DataFrames = "$UUID_DATAFRAMES"
[compat]
DataFrames
An alternative possibility is to just put DataFrames inside [deps] and then have a list of names that are conditional.
[deps]
SomeOtherDep = "..."
DataFrames = "$UUID_DATAFRAMES"
[conditional-deps] = ["DataFrames", ]
Precompilation works on a module granularity so we want a module containing the glue code for each conditional dependency. The gluecode would be stored (based on a documented convention) in a file inside the package, eg src/DataFramesGlue.jl inside Plots where the exact name of the file is yet to be decieded.
An example of a glue file for Plots conditionally depending on DataFrames is:
module DataFramesGlue
using Plots, DataFrames
Plots.plot(df::DataFrame) = ...
end
When DataFrames gets loaded, we check all packages that declares a conditional dependency with it. If the version of DataFrames loaded is compatible with the compat entry for a package with DataFrames as a conditional dependency, we load the glue code which will act like a normal package and precompile. We need to teach code loading some stuff about glue packages so it knows how to map the names inside the glue module to the UUIDs in the "main package".
The fact that we are not trying to resolve a set of versions compatible with the conditional dependency avoids cases where we in general need to resolve in arbitrarily many times with potential of cycles.
Thank you very much for writing this up.
I have a use case in LogDensityProblems.jl which I am wondering about. Specifically, both the glue code for working with ForwardDiff and ReverseDiff relies on DiffResults to extract gradients.
Currently this is handled by a code that looks like
function __init__()
@require DiffResults="163ba53b-c6d8-5494-b064-1a9d43ac40c5" include("DiffResults_helpers.jl")
@require ForwardDiff="f6369f11-7733-5829-9624-2563aa707210" include("AD_ForwardDiff.jl")
@require ReverseDiff="37e2e3b7-166d-5795-8a7a-e32c996b4267" include("AD_ReverseDiff.jl")
end
so, for the purposes of Requires.@require, DiffResults is considered available because if the user is using ForwardDiff then it loaded DiffResults so it triggered the shared glue code.
Would this continue to work? For the mechanism you propose, I imagine I could just provide deps information for DiffResults.
Generally, how is it handled when glue code needs other modules which themselves would not trigger glue code of their own? Can we still specify eg compat bounds for them?
If I understand your example correctly, you would just have to declare a conditional dep on DiffResults (and ForwardDiff + ReverseDiff).
Thanks. So, if I do that, then eg it would be triggered by ForwardDiff loading DiffResults, and the latter would not have to be explicitly loaded by the user? That's the way it works now with Requires.
Yes.
What if I want to define a glue module to be loaded when _both_ CuArrays and OrdinaryDiffEq are imported? That is to say, can there be something equivalent to the following?
function __init__()
@require CuArrays="..." begin
@require OrdinaryDiffEq="..." include("glue.jl")
end
end
I guess a possible API would be to include (say) [on-import] section in Project.toml to bundle conditional-deps explicitly
[conditional-deps]
CuArrays = "..."
OrdinaryDiffEq = "..."
[compat]
CuArrays = "..."
OrdinaryDiffEq = "..."
[on-import]
foo = ["CuArrays", "OrdinaryDiffEq"]
which tells the loader to include src/on-import/foo.jl when CuArrays and OrdinaryDiffEq are loaded.
This sounds fantastic. Is this a feature that would be available in a Julia 1.x release, e.g. Julia 1.4 or Julia 1.5? Or would it have to wait until Julia 2.0?
One declares a conditional dependency by adding an entry to the Project.toml file as:
[conditional-deps]
DataFrames = "$UUID_DATAFRAMES"[compat]
DataFramesAn alternative possibility is to just put DataFrames inside [deps] and then have a list of names that are conditional.
[deps]
SomeOtherDep = "..."
DataFrames = "$UUID_DATAFRAMES"[conditional-deps] = ["DataFrames", ]
I like the first one more. Listing the conditional dependencies under [deps] might get a little confusing.
This sounds fantastic. Is this a feature that would be available in a Julia 1.x release, e.g. Julia 1.4 or Julia 1.5? Or would it have to wait until Julia 2.0?
Some Julia 1.x.
What if I want to define a glue module to be loaded when both CuArrays and OrdinaryDiffEq are imported?
Yeah, I thought about this a little bit too. A first implementation of this might not support this but we should probably make sure that adding it will not be awkward.
Triaging to discuss what to do about multiple conditional dependencies (which kind of starts to sound like "features").
Multiple conditional dependencies (2, 3, or even more than 3 packages
required for the glue code to be loaded) is definitely a use case for me!
Looking forward to seeing what triage thinks!
On Fri, Aug 16, 2019 at 10:53 Kristoffer Carlsson notifications@github.com
wrote:
Triaging to discuss what to do about multiple conditional dependencies
(which kind of starts to sound like "features").—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/Pkg.jl/issues/1285?email_source=notifications&email_token=ABK4BLJHIYIGLM2FZQR3MVDQE25OJA5CNFSM4IJEA2D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4O2SXY#issuecomment-522037599,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABK4BLI2CGSXV24O7MRRMLDQE25OJANCNFSM4IJEA2DQ
.
Regarding to multiple deps, maybe we could borrow something similar from rust-cargo (as features), it was included in this proposal: #977
AFAIU, the difference between features and conditional dependencies is that a feature is something that someone opts into from the current active Project while a conditional dependency is automatically "activated" whenever it's requirements are satisfied.
I'm going to make an alternate proposal here. First: I think we should not call these conditional dependencies. They're _NOT_ dependencies—they are packages that glue other packages together and are loaded automatically when the set of packages that they glue together are loaded. They depend on the packages that they glue together, not the other way around! This is crucial. So instead, I propose that we call them "glue packages". Here's how we could specify them in a package's P's Project.toml file:
name = "P"
uuid = "<uuid>"
[deps]
# P's dependencies here
# glue with a single dependency
[glue]
A = "<uuid>" # source at `glue/A.jl`
B = "<uuid>" # source at `glue/B.jl`
# glue with multiple dependencies
[glue.CD] # source at `glue/CD.jl`
C = "<uuid>"
D = "<uuid>"
There's a few ways that we can go with the glue/{A,B,CD}.jl files. One way is to treat them like normal packages that have to define a module of the appropriate name. This is a little weird, though because the file A.jl glues together P and A so it shouldn't define a module named A it should define a module named A_P or something like that. Also, the module's name _doesn't matter at all_: no one ever loads it by name. The only reason it needs to exist is so that we can save it in .ji files. So maybe these should be more implicit, as if this is done for you:
module P_A
import P, A
include("glue/A.jl")
end
Then inside of the file glue/A.jl all you have to do is define all the functionality needed to glue P and A together. Similarly for a multi-dependency glue file like CD.jl it would be implicitly loaded like this:
module P_CD
import P, C, D
include("glue/CD.jl")
end
Now, the actual loading would work like this: when all of the packages P, C and D have been loaded—however that happens—then the glue package P_CD is also loaded. As I mentioned before, since it is a package that depends on P, C and D it gets its own .ji file which can be reloaded whenever another Julia process using the same versions of these three modules runs.
They're _NOT_ dependencies—they are packages that glue other packages together and are loaded automatically when the set of packages that they glue together are loaded.
I agree. I'm using on-import section in my earlier comment to emphasize that it is more like a "hook." So "import hook" may be an alternative terminology (not that "glue package" is bad).
I think one benefit of formalizing it as hooks is that code loading can respond to other "conditions" like feature flags like @Roger-luo is suggesting. Project.toml file can look like something like:
[extras]
CuArrays = "..."
Zygote = "..."
[hook.GPUImpl]
import = ["CuArrays", "Zygote"]
feature = ["GPU"]
which indicates that hook/GPUImpl.jl will be (precompiled and) loaded when CuArrays and Zygote are imported and feature flag GPU is specified.
I'm not suggesting adding feature flag support right now. I just thought this format is more extensible.
I very intentionally don't want this mechanism to be too flexible. I don't want anything besides loading a glue package to happen when a set of packages are loaded. Of course, that's not very restrictive since loading a package can execute arbitrary code, but it does mean that there's a module that results which can be precompiled and saved—and that not being the case is precisely what's so problematic with the current requires system. Having arbitrary import hooks are likely to have all the problems that requires currently has.
I also very much do not want this to be a mechanism for changing the behavior of packages. The only liberty that a glue package should take is that it can define methods (and types, I guess) for that depend on the types the packages that it glues together. So, it would be considered type piracy for a normal package A that depends on B and C to define B.f(::C.T) but if it's a glue package gluing B and C together, then it's perfectly kosher to do that.
I don't know how we should handle package features like what Roger wants, but it must not be this, or we will completely screw up the ability of this feature to fix the current precompilation issues.
It was not my intention to suggest introducing any events for the hooks more dynamic than code loading events. If you think the term "hook" suggest features more dynamic than what is already possible by what you are suggesting, it probably is not the best term to use.
But (temporal) dynamism and flexibility are different and feature flags can be implemented in very static manner. For example, if MyPackage needs a set of feature flags, Pkg can create (say) ~/.julia/options/$manifest_slag/MyPackage.toml for each environment where manifest_slag is the hash of the full path of Manifest.toml. This option file can then be tracked as a dependency of the .ji files (using include_dependency) of the glue modules.
Actually, let me take back my earlier comment. Feature flags can be turned into consts of MyPackage and then can be checked inside the glue modules. This approach would waste precompile cache files (i.e., creates a no-op .ji files when certain glue is not needed when certain flags are not set) but it's probably better to orthogonalize glue modules and feature flags concepts.
We need to be able to give compat info. How about making the glue packages look a lot like a "mini package" but each glue package is under a glue header:
[[glue]]
[glue.deps]
A = "<uuid>" # source at `glue/A.jl`
# Adding compat and file
[[glue]]
file = "glue/B_flue"
[glue.deps]
B = "<uuid>"
[glue.compat]
B = "0.4"
# Multiple
[[glue]]
file = "my_glue_C_D.jl" # source at `glue/my_glue_C_D.jl`
[glue.deps]
C = "<uuid>"
D = "<uuid>"
[glue.compat]
C = "0.2"
D = "0.1"
It's pretty ugly with all the [glue.] though.
Just allow anything that appears in a glue stanza in the normal [compat] section?
To elaborate, I think it would be confusing to use clashing names across glue packages so I think it's sane for them to have to match and it doesn't make sense for compat bounds not to match across glue packages, so we can just put glue bounds in [compat] with the name used in the [glue] stanzas.
How would testing "glued packages" look? Could there be something similar for a test/Project.toml and files like test/glue/A.jl etc.?
How is this proposal different from https://github.com/JuliaLang/Pkg.jl/issues/1251? It might be more scalable to have each set of glue be their own full-fledged package? I guess this is attempting to be a lighter weight alternative? Should we have a naming convention at the package level? The main problem I'm trying to solve is that the main package plus its glue should be in a single repository. Those who are trying to do installations may also wish for express control which glue packages/modules get installed?
To summarize what I'm currently proposing:
name = "P"
uuid = "<uuid>"
[deps]
# P's dependencies here
# glue with a single dependency
[glue]
A = "<uuid>" # source at `glue/A.jl`
B = "<uuid>" # source at `glue/B.jl`
# glue with multiple dependencies
[glue.CD] # source at `glue/CD.jl`
C = "<uuid>"
D = "<uuid>"
[compat]
# deps compat here, but also glue compat:
A = "1.2"
B = "~2.3"
C = "0.5.3"
D = "2"
This means glue names are in the same project namespace as [deps] and [extras] and can therefore be referenced and constrained via [compat]. The glue code goes into files like this:
glue/A.jl
glue/B.jl
glue/CD.jl
Those files DO NOT have to declare modules or imports, they are provided automatically, so the contents of the files are just like this for glue/CD.jl:
# module P_CD
# import P, C, D
P.f(c::C.type, d::D.type) = ...
# end
You do not write the module P_CD part or the import P, C, D—those are done implicitly. The actual name of the module may not be P_CD since it should not be referred to or imported directly.
@StefanKarpinski: just a question about the syntax in your proposal: for multiple dependencies, should we understand
[glue.ABC...]
A = "<uuid>"
B = "<uuid>"
C = "<uuid>"
...
literally, ie the 26 capital letters [A-Z] are the valid placeholders? Wouldn't one run out of combinations quite rapidly?
Perhaps
[glue.arbitrary_key] # with source at glue/arbitrary_key.jl
SomePackage = "<uuid>"
SomeOtherPackage = "<uuid>"
for eacg glue bit would be a bit more general.
I agree that version bounds can go in [compat] together with the non-glue packages, as allowing for glue-specific combinations of bounds should not be necessary.
The letters are stand ins for actual package names. The name CD is an arbitrary key, but would presumably often be a concatenation of the names of the packages involved, though not necessarily.
@StefanKarpinski @KristofferC Thank you so much for working on conditional dependencies, I think for usability and not polluting the namespace, it's a great idea.
How would implementation dependencies of glue be handled? Let's say a user would like to use DataKnots to connect to a PostgreSQL data source (our glue code is currently in its own unregistered package, DataKnots4Postgres). This glue code has an additional dependency upon PostgresCatalog package which depends upon LibPQ, however it's not clear a user should need to know about PostgresCatalog, since it is an implementation detail. To map this onto the proposal, DataKnots fits P and the glue code would be an equivalent to CD where C is LibPQ and D is PostgresCatalog. Asking a user to know about LibPQ seems reasonable, given we want them to make a PostgreSQL connection first and then use it to build a DataKnot. However, PostgresCatalog is more of an implementation detail and doesn't currently show up in the documentation.. Further, what if we wish to change it or add additional dependencies later, it'd be nice to not have to break the package interface. I'd prefer glue code could import pending a single trigger package, such as LibPQ, but then specify additional implementation level dependencies, such as PostgresCatalog.
One last question -- would all the glue necessarily have to be in a directory called glue? Is it a subdirectory of src or parallel to src? Can this glue code module include other files in the glue directory? Could the referenced glue, e.g. CD, be a directory rather than a module? A bit more detail about this would be helpful, as some of our glue modules are quite involved.
That scenario feels a lot heavier weight than what this is intended for, which is providing method definitions that only make sense in the presence of two or more interacting packages. As soon as you get into glue packages having their own extra dependencies, that's starting to feel a lot like a separate package entirely. In you scenario, what are the trigger packages for loading the glue? DataKnots and what else?
For this use case, besides DataKnots, the trigger package would just be LibPQ. In addition to PostgresCatalog, there is also a implementation detail dependency upon Tables, however, DataKnots also depends upon Tables. We wouldn't want DataKnots to depend directly upon PostgresCatalog since that code depends upon LibPQ. We pulled out PostgresCatalog since it is non-trivial and others may find it independently usable, or it could have just been a few included files.
Regardless, from the user perspective, it fits the use case: I want to use DataKnots on a PostgreSQL data source. Similar cases exist for MySQL, and dozens of other data sources, such as XML. These could all be separate packages, if they are in a single repository to manage code synchronization, we'd have DataKnots4X where X is Postgres, XML, and so on. Currently DataKnots4Postgres is only a single file but it will likely get quite larger as we implement SQL push-down. Even so, it seems this conditional dependency mechanism was relevant -- the glue enables DataKnot(PgSQL.Connection("")) (see docs).
Perhaps I'm being a bit dense today, but I'm still not getting the dependency graph.
I'm assuming user code would directly depend upon DataKnots and LibPQ causing DataKnots4Postgres.jl glue to be loaded. Yet, DataKnots4Postgres depends upon PostgresCatalog for its implementation, a dependency which DataKnots does not have, since it would pull in LibPQ. All of these modules depend directly or indirectly upon Tables and others.
User Code
| . |
/---------/ . \---------------\
| . |
| DataKnots4Postgres.jl |
| | | | | |
|/---/ | | \-------\|
|| | | ||
|| | PostgresCatalog ||
|| | | ||
|| | \---------\||
|| | |||
|| | |||
DataKnots | LibPQ
| | |
\--\|/-------------------/
|||
Tables, etc.
This DataKnots4Postgres glue is not really independent -- the DataKnots queries should just magically work on PostgreSQL data sources once the glue is loaded. The only additional documentation might be part of a cross-reference table that shows which operations have been implemented by backend, and this documentation would live in the core DataKnots repository. As such, the PostgresCatalog is a implementation dependency, it is not related to the user experience with the system. It was pulled out only because someone could use it independent of DataKnots.
Currently, DataKnots4Postgres is an independent package (with its own repository). If we could move this code into the DataKnots repository that would be a huge win, even if this conditional logic doesn't apply. However, if this glue mechanism isn't appropriate, even with one repository we'll probably have a dozen or so DataKnots4X driver packages to include the "C" dependencies, such as libpq for PostgreSQL. As another example, DataKnots4XML would have a lower level dependency on either libxml2 or expat shared libraries. There will be shared "SQL" code generation logic, but that will certainly live in the core DataKnots package since it could be reused across various SQL data sources and really doesn't need an independent existence.
@StefanKarpinski Any thoughts on glue packages? This seems fits our use case better than loading a set of adapter packages to the main repository. Although perhaps it need not be automagical. For example, add DataKnots.Postgres could be required, where Postgres is a sub-package of DataKnots (this would install dependencies, e.g. PgLib, etc.). Then, perhaps there could be using DataKnots.Postgres?
It looks like this will require some changes to Pkg and some changes to the code loading in Base.
@KristofferC @StefanKarpinski Could you give some details on which aspects of Kristoffer's original proposal need to be implemented in Pkg, and which aspects need to be implemented in code loading in Base?
It might help to break the work down into smaller chunks, so maybe people can start work on implementing the functionality described in Kristoffer's original post.
Also, we should talk about this whenever we have the next Pkg call.
Will we be able to "hold the train doors" for this for 1.6? It would be really good to have first-class conditional dependencies in the next LTS.
@StefanKarpinski @KristofferC
So, I think I ran into a case for this just recently. I'd like Hyperscript.jl to work with HTTP.jl "out of the box" (https://github.com/yurivish/Hyperscript.jl/issues/22). But it took some time to figure out exactly how to do it. I think without conditional dependencies, we'd need to think about having some sort of Tables.jl like interface for HTTP.jl (not that this isn't otherwise a bad idea). Anyway, the ideal function would be included in Hyperscript.jl that would have additional code activated if HTTP.jl is loaded...
import HTTP: Response
Response(status::Int, node::Hyperscript.Node) = Response(status,
[SubString("Content-Type") => SubString("text/html; charset=utf-8")];
body=HTTP.bytes(string(node)))
Response(node::Hyperscript.Node) = Response(200, node)
I think my other cases (articulated above) are more "heavyweight" and will be well-served with traditional packages and mono-repository design (it'd be great if that were documented).
Any updates on this? Seems there is a lot of interest from the growing list of references to this issue.
It would _really_ simplify development of low-level common deps like ConstructionBase.jl that need to add methods to objects in other packages to work more widely.
Yeah, we should probably have a Pkg call to discuss and come up with a plan and execute it.
Most helpful comment
Will we be able to "hold the train doors" for this for 1.6? It would be really good to have first-class conditional dependencies in the next LTS.
@StefanKarpinski @KristofferC