Currently when we move a pin we then rebuild packages on the graph which have direct dependencies on the newly pinned package. However this could produce breakage. For example if we were to pin a new version of ABI incompatible boost in our current workflow we'd move the boost pin and then run the rebuild the packages in the graph, but any new package (or package with got a rerender for some other reason) would get an incompatibility as they would get some packages using the new version of boost and some using the old since the migration is only partially done.
A potential new approach would be to rebuild the chunk of the graph first, producing a bunch of new builds on the new pin, then move the pin in the official pinning feedstock so new packages (and packages which depend on rebuilt packges) could take advantage of the new pin.
Is conda smart enough to know not to use the new pin since some of the other dependencies haven't been built with it yet?
could we have two pin files?
@CJ-Wright I don't think conda will generically know which pinned version was used and produce consistent environments.
One case I can think of that will work is if bumping a pin produces a bumped runtime dependency via a run_export. With the new runtime dependency at a higher version, conda will produce consistent environments.
As you stated above, the only way this will work is if we bump the global pinning after the rebuild is done, or at last close enough to done.
I bet there is an edge case I am missing here. :(
@mariusvniekerk We would need at least two pin files. Each time we bump a pin, we generate a migration from the current global pinning to the next one. If we have more than one such migration happening (because two different pins are bumped), then we would have to maintain multiple, possibly overlapping, migrations simultaneously.
I think a local solution might be best here.
If we have more than one such migration happening (because two different pins are bumped), then we would have to maintain multiple, possibly overlapping, migrations simultaneously.
Yes, this is most likely something we'll need to manage.
I like having conda-smithy handle the temp pin, since the correct thing happens without any additional work for the maintainer or CI. It could be possible to tell smithy to inspect the file, compare it to the global pinning file and then remove the pins which are in the global pinning file, leaving the rest.
We could even include in initial PR information for the bot (eg the version of the migration, which top level packages there are, etc.)
This won't work as well when we need to go play with the internal bits (eg change the platform information) but that's ok.
So conda smithy can only remove the pin once the migration is done and the global pinning is adjusted.
Managing this bit of global state seems annoying.
I wonder if we can simply leave the local pinnings file instead of cleaning it up. The next time we need to adjust them, we will come through with the bot anyways. In the mean time they can sit there and not mess with anything.
To some extent we already handle that global state, since we have the pinning feedstock and it is a dep of conda-smithy. I don't know if there is a way to compose pinning files. I wonder if there are things in the pinning file which a feedstock would care about but would not fall under a migration's purview.
I did some coding against this issue last night for my own edification.
A few things:
from conda_build.config import Config
from conda_build.variants import (
parse_config_file, OrderedDict, combine_specs)
config = Config()
specs = OrderedDict()
specs['global_pins'] = parse_config_file('global_pins.yaml', config)
files = glob.glob('migrations/*.yaml')
for f in files:
specs[f] = parse_config_file(f, config)
combined_spec = combine_specs(specs, log_output=config.verbose)
As stated in the docs, the ordering of the files matters because files later in the process will clobber the earlier ones. Files passed at the command line via conda build -m pinning.yaml have the highest priority. Right now conda-smithy takes the global pinning and renders it to platform and/or python specific pinning files. So if we want to override that, we will have to put the local pinnings last either at the command line or by combining them with the global ones during in code like above.
The conda_build_config.yaml file in the recipe dir has lower priority than any file passed at the command line.
Given the above, I have a revised suggestion for how to manage the migrations.
Changes to existing tools and/or new repos:
.local_pins.yaml) for extra pins. If such a file is found, then it gets added last when the feed stock is rerendered, overwriting any global pins. These updated pins will then appear in the normal CI configuration files generated by conda-smithy.conda-forge-pinning-migrations or somethingMigration procedure:
.local_pins.yaml file. (Any old migrations no longer active are removed at this time as well.) Then the feedstock is rerendered which adds the new local pins to the CI scripts. Finally, the bot makes a PR etc.I don't know how to accomplish 4 yet. The combination of selectors and yaml makes coding an update complicated. We could store the pinning migrations as pure diffs, but this gets fragile when there is more than one.
For step 2 above, we could simply have the existing bot pull the migrations repo each time it runs. No need to trigger it or make a new bot.
One option for 4 would be to refactor the global pinnings file into a set of files where each file has a name corresponding to the top level key and the file contains the top level key and the contents.
For example, this file
foo: 2
bar:
- 3 # [osx]
- 4 # [linux]
would become two files
foo.yaml:foo:2
bar.yaml:bar:
- 3 # [osx]
- 4 # [linux]
Then the bot could simply do a wholesale replacement of the file.
~This works for simple pinnings, but won't work for more complicated things.~ Edit: Conda build appears to handle pin_run_as_build sections correctly. IDK what it does with zip_keys.
Ack. We can use git to do step 4. We accept migrations as PRs onto the global pinnings file. This updated file is the one that is used by the bot. Then we simply reapply the diffs from the merged PR onto the feedstock pinning repo.
Well reordering commits might cause issues. The separate files might be more robust. Hrmmmmmm.
@beckermr :heart: this is awesome!
So my general view is that we have the global pinnings file and then smaller local ones and we compare the migrator local variant with global to see if things are greater/already compatible on a per value basis. If not we delete that pinning migrator file as it is now no longer relevant. This can be done as a variant preprocessing phase in conda-build-config.
The other part that is of importance is additive (non replacing migrations). For openssl we may want to continue building against 1.0 for a while. Thus we need a way to combine pins additively instead of just replacing them.
# global.yml
openssl:
- 1.0
# migrator
# combine_mode: add
openssl:
- 1.1
# result
openssl:
- 1.0
- 1.1
Given that this additive thing doesn't exist presently, this might be a nice library to use to merge variant files, (which would also be needed for finalizing migrations as @beckermr pointed out)
@mariusvniekerk Writing code by hand to support the comparisons above scared me off because of the selectors that conda-build allows and how they can depend on the local environment. For straight dictionaries, I don't see a problem with your proposal. I guess there is code to do this already?
FWIW, we don't actually need to delete old migrations ever. It is nice, but not required since conda-build will just replace with the same value. Further, if we are deleting old migrations under the assumption that higher package versions are more recent, this removes the ability to downgrade pins if needed.
If we replace whole sections of the files by splitting the global pinnings into a bunch of small files, then we can add pins easily like this
old:
openssl:
- 1.0
proposed:
openssl:
- 1.0
- 1.1
Then the final merge into the global pinnings just copies the contents of the proposed openssl pinnings to the openssl pinnings file in the final repo.
I am not trying to argue too strongly for this since being able to combine variant files with selectors is a nice thing to have, but replacing whole files is a simpler solution to maintain.
Does that mean we're going to have an explosion in the build matrix?
Not following @CJ-Wright. If someone adds more than one version, then yes, we would double the number of packages in the subgraph that depends on the extra version. Also, conda-smithy removes pins that don't apply to a given recipe so they don't propagate everywhere.
Thinking over this now, I think we use git to manage this. Here is how it would work. Instead of accepting a snippet of yaml to be used as the migration, we actually force people to make a change to the global pinnings file. This new global pinnings file then is then what is located in the local_pins.yaml file.
The proposed migrations repo would have a few rules associated with it to make our lives better as well.
These rules would keep things orderly and hopefully functional. It also has the advantage that we could in principle bump more than one pin or other big change all at once, potentially saving CI time.
This scheme also allows us to figure out if a node needs to be migrated in a robust way. Instead of having to code conditions by hand, we can ask conda-smity to rerender the feedstock with the new local pinnings. If the CI pinning files are different, we make a PR, otherwise we move on. We would want to cache information about which nodes need migration to avoid having to do this more than once.
This scheme also avoids us having to diff or combine pinning files by hand. We simply apply the local file after the global one and let conda-build do all of the work as it already does.
We may also want to block pinning migrations on each other. If a node needs more than one pinning migration, the earlier commit has to be finished first.
I'm not certain how this blocking would work.
I think the bot looks for a previous migration PR it knows it made on a node. If it finds one, then the two migrations are blocked and the other one has to finish first.
One way to do this would be to store the commit hash or migration version in a comment in the local pinnings file. If the bot finds the local pinnings version in the list of current migrations, it would know.
The discussion of this has moved to a formal CFEP, https://github.com/conda-forge/conda-forge-enhancement-proposals/pull/13, so I'm going to close this, please continue discussion there.
Most helpful comment
@CJ-Wright I don't think conda will generically know which pinned version was used and produce consistent environments.
One case I can think of that will work is if bumping a pin produces a bumped runtime dependency via a
run_export. With the new runtime dependency at a higher version, conda will produce consistent environments.As you stated above, the only way this will work is if we bump the global pinning after the rebuild is done, or at last close enough to done.
I bet there is an edge case I am missing here. :(
@mariusvniekerk We would need at least two pin files. Each time we bump a pin, we generate a migration from the current global pinning to the next one. If we have more than one such migration happening (because two different pins are bumped), then we would have to maintain multiple, possibly overlapping, migrations simultaneously.
I think a local solution might be best here.