After an automatic upgrade, if you stop the chain at height X and re-start it the following error is returned:
failed to load latest version: failed to load store: initial version set to X+1, but found earlier version 50
exit status 1
I discovered this while doing a test to make sure v0.40 does not have this bug.
To test it, I've made two branches of our project:
riccardo/no-module, which does not have one particular module (reports) installed riccardo/with-module which does have such module installedThe rest of the code is identical in the two branches, and is all based on v0.40.0-rc6.
To test the correct behavior of x/upgrade, what I did is the following:
no-module. 50. What happens is that if we stop at height X and try to restart, the following error is returned:
failed to load latest version: failed to load store: initial version set to X+1, but found earlier version 50
exit status 1
Following you can find the script that I'm using to easily setup everything properly:
#!/bin/bash
shopt -s expand_aliases
echo "===> Deleting previous data"
rm -r -f $HOME/.desmosd
echo "===> Setting up upgrade manager"
mkdir -p ~/.desmosd/cosmovisor/genesis/bin
cp ~/Desktop/Desmos/Genesis/desmosd ~/.desmosd/cosmovisor/genesis/bin
mkdir -p ~/.desmosd/cosmovisor/upgrades/test/bin
cp ~/Desktop/Desmos/Upgrade/desmosd ~/.desmosd/cosmovisor/upgrades/test/bin
echo "===> Initializing chain"
cosmovisor init testchain --chain-id testchain
echo "===> Editing genesis.json"
sed -i 's/stake/udaric/g' $HOME/.desmosd/config/genesis.json
sed -i 's/"voting_period": "172800s"/"voting_period": "120s"/g' \
~/.desmosd/config/genesis.json
echo "===> Creating genesis accounts"
cosmovisor add-genesis-account jack 100000000000000000000udaric
cosmovisor add-genesis-account desmos16f9wz7yg44pjfhxyn22kycs0qjy778ng877usl 10000000udaric
echo "===> Collecting genesis trasanctions"
cosmovisor gentx jack 1000000000udaric --amount 10000000udaric --chain-id testchain
cosmovisor collect-gentxs
echo "===> Starting chain"
cosmovisor start
To use the above script, you need to setup your ~/Desktop/Desmos folder accordingly:
mkdir -p ~/Desktop/Desmos/Genesis/
git checkout riccardo/no-module
make build
cp build/* ~/Desktop/Desmos/Genesis/
mkdir -p ~/Desktop/Desmos/Upgrade
git checkout riccardo/with-module
make build
cp build/* ~/Desktop/Desmos/Upgrade/
And this is the script I'm using to submit the upgrade proposal:
#!/bin/bash
shopt -s expand_aliases
echo "===> Submitting governance proposal"
cosmovisor tx gov submit-proposal software-upgrade "test" \
--title "test software upgrade proposal" \
--description "something about the proposal here" \
--deposit 100000000udaric \
--upgrade-height=50 \
--from jack \
--chain-id testchain --yes
echo "===> Waiting for transaction to be effective"
sleep 15
echo "===> Voting governance proposal"
cosmovisor tx gov vote 1 yes --from jack --yes --trace --chain-id testchain
These are the environmental variables I've setup for Cosmovisor:
# Setup Cosmovisor
DAEMON_NAME=desmosd
DAEMON_HOME=/home/riccardo/.desmosd
DAEMON_RESTART_AFTER_UPGRADE=true
I have not tried to test this without Cosmovisor.
Replicated from my comment in Discord:
Based on the setup Riccardo mentions, the cosmovisor processes successfully upgrades the chain. He then allows the new chain to mint a few blocks, and then he stops the chain and restarts
of course when he restarts, all the code in app initialization will also get re-executed
In his upgraded binary, he has the following code:
// Set the store loader for the upgrade to work properly
app.SetStoreLoader(func(ms sdk.CommitMultiStore) error {
return ms.LoadLatestVersionAndUpgrade(&storetypes.StoreUpgrades{
Added: []string{reportsTypes.ModuleName},
})
})
So this code gets executed everytime the app starts. So when Riccardo stops the chain and then restarts at height X, the storeloader will run and reach this line:
https://github.com/cosmos/cosmos-sdk/blob/adcecfa162d0f2ed14af17c192e718961ea67e60/store/rootmulti/store.go#L193
// If it has been added, set the initial version
if upgrades.IsAdded(key.Name()) {
storeParams.initialVersion = uint64(ver) + 1
}
So every time the app restarts with the new upgraded binary, we are resetting the initial version of the added store to the latest version+1
This makes sense why Riccardo was getting the error:
failed to load latest version: failed to load store: initial version set to X+1, but found earlier version 50
exit status 1
Since the app resets the initial version of the new store everytime it starts up, then when it loads the store, it realizes there is already an earlier version existing from the initial version.
The error Riccardo is seeing is all the way from the iavl LoadVersion code:
https://github.com/cosmos/iavl/blob/master/mutable_tree.go#L356
if firstVersion > 0 && firstVersion < int64(tree.ndb.opts.InitialVersion) {
return latestVersion, fmt.Errorf("initial version set to %v, but found earlier version %v",
tree.ndb.opts.InitialVersion, firstVersion)
}
The store migration code in rootmultistore definitely seems wrong to me. The upgrades are applied every single time the app starts, rather than just once. This code needs to be changed such that the upgrade code is not called on app initialization OR the code must check if the upgrade has already been applied before applying the upgrades. This will ensure the upgrade store migration changes only happen once during the upgrade, and not after subsequent restarts.
This should be addressed before stargate.
EDIT: It seems that this is not actually a bug since the LoadVersionAndUpgrade functions are intended to only be called inside the upgrade handler and thus are intended to only be executed once. This should be very clearly documented in the godocs, as it currently is not. Users who do not do this will face the issue that @RiccardoM did above. I think some documentation before tagging final release would be helpful but not strictly blocking a release.
I also think that someone on core sdk team familiar with cosmovisor should explicitly test the automatic upgrade feature with a live test chain one last time before shipping since this has been a common source of confusion, we should ensure that the process works and there is a clear workflow documented.
cc: @ethanfrey @zmanian
@AdityaSripal I tried doing what you suggested, by setting the StoreLoader inside the upgrade handler with the following piece of code:
// Register the upgrade handler for the relationships upgrade
app.upgradeKeeper.SetUpgradeHandler("test", func(ctx sdk.Context, plan upgradetypes.Plan) {
// Set the store loader for the upgrade to work properly
app.SetStoreLoader(func(ms sdk.CommitMultiStore) error {
return ms.LoadLatestVersionAndUpgrade(&storetypes.StoreUpgrades{
Added: []string{reportsTypes.ModuleName},
})
})
})
Unluckily, this results in the following error when applying the upgrade:
applying upgrade "test" at height: 50
panic: SetStoreLoader() on sealed BaseApp
I also tried setting it using the UpgradeStoreLoader method:
// Register the upgrade handler for the relationships upgrade
app.upgradeKeeper.SetUpgradeHandler("test", func(ctx sdk.Context, plan upgradetypes.Plan) {
// Set the store loader for the upgrade to work properly
app.SetStoreLoader(upgradetypes.UpgradeStoreLoader(plan.Height, &storetypes.StoreUpgrades{
Added: []string{reportsTypes.ModuleName},
}))
})
but the same error is returned as well.
If I would have to take a guess, I would say that this is caused by the fact that once the node starts, the first thing it does is calling this code:
if loadLatest {
if err := app.LoadLatestVersion(); err != nil {
tmos.Exit(err.Error())
}
}
This causes LoadLatestVersion to call app.init(), which calls app.Seal() and seals the application, making it impossible to change the StoreLoader once it has started.
Since the upgrade handler can only be called once the app has started properly (hence, it has been sealed), this results in the above error being raised.
In order to verify is the above idea was correct, I commented the following lines inside baseapp/options.go:
func (app *BaseApp) SetStoreLoader(loader StoreLoader) {
//if app.sealed {
// panic("SetStoreLoader() on sealed BaseApp")
//}
app.storeLoader = loader
}
This resulted in the update to proceed smoothly. I also tried stopping the node and re-starting it, and it worked without a single problem.
I have no idea if commenting out those checks might impact somewhere else in the code, but that seemed to solve this bug.
If it's the case that you cannot remove those checks, I think that an alternative might be to implement a new method that allows to set the store loader even if the app has already started.
CC @ethanfrey @zmanian
Nice research there @AdityaSripal and thank you for trying this out @RiccardoM with a very valid error.
I tested this in a 0.38 version back in fall 2019... I haven't worked on this code since then. I really wish @aaronc or the Regen crew that added the feature was being more proactive supporting or debugging it. I will look up the code sample from back in the day that tested the upgrade mechanism.
The original PR adding this functionality is here: https://github.com/cosmos/cosmos-sdk/pull/4724/files (Aug 2019)
There was a unit test ensuring this properly migrated the data (not using baseapp): https://github.com/cosmos/cosmos-sdk/pull/4724/files#diff-9ab8b6b1ae348e450f51d4a110e504d9aee67d848a997128a39f914a0acfa7f7R159-R252
The baseapp test did pass the new loader in the BaseApp constructor: https://github.com/cosmos/cosmos-sdk/pull/4724/files#diff-a40b307156ef669dfec87e99a660ab3fe06e89adbe6b5d8861386b4f69bc1ef0R182-R269
I now see we used to store a JSON file with the upgrade/rename info. This was essential, as we could not query the IAVL store to see if we are due for a given upgrade until after we have mounted the store. But we needed to know about the substore migrations (Rename, Delete, Add substores) before mounting the store...
The solution @aaronc and I came up with was to dump a JSON file with that upgrade info when we hit the upgrade height, and it would be run when restarting. One issue there, is it requires the old binary to know what changes would happen in the new binary. The other approach, which was based on the cosmosd/cosmovisor solution was to add this upgrade JSON file in the same folder as the new binaries, so it would be detected when the upgrade happened.
I have no been involved in this code in ~16 months, and not sure how this thinking or design has advanced over 2020 with all the stargate changes. After reviewing the original code, I do think that some external JSON file is needed to control that. (And the JSON file is deleted after execution, so only run once)
I would be very happy for comments by some of the people who have touched this in the last year (thank you to Aditya for your contribution). Here is who was working on the module: https://github.com/cosmos/cosmos-sdk/commits/master/x/upgrade
@RiccardoM what I described above is also documented here: https://github.com/cosmos/cosmos-sdk/blob/master/x/upgrade/spec/01_concepts.md#storeloader
I am not sure if these docs are fully up to date, and it is missing a full example, but should give some hints on how to move forward
So one small note - originally there was no need to use upgrade store loaders when new stores were added - just for renames and deletions so this was considered really an edge case. The case for "added" stores was added in #7415 - I will study that code a bit to see what other consequences there might be of this.
As for how to use upgrade store loaders, I would note that the documentation does give a pretty detailed example: https://pkg.go.dev/github.com/cosmos/[email protected]/x/upgrade#hdr-Performing_Upgrades. Maybe there could be some _easier_ place to find this documentation but the godoc for the upgrade module is not an illogical place either.
Can you try this again following the example in the docs @RiccardoM ?
So is the gist of the issue here that nobody will look at godocs and everything needs to be in the spec so it ends up on docs.cosmos.network? Seems like there are a few modules with detailed godocs, but mostly not. I guess maybe we need to really make it a clear convention to use specs and not godocs - I certainly wasn't clear on that when we were working on this.
I was going through the implementation and there is a check already for upgrade height, https://github.com/cosmos/cosmos-sdk/blob/v0.40.0-rc6/x/upgrade/types/storeloader.go#L13, was wondering how these stores are getting updated on restart everytime.
Great comment @aaronc
And no I hadn't looked at cosmos sdk go doc much, as it had not been a very good place to get information about modules. The READMEs, ADRs and digging in the source were all I really had.
I guess this changed over 2020. We can start a re-education campaign to "just read the godoc" :smile:
@RiccardoM I have tested with correct store-migrations logic and everything is working fine. I made a PR onto your branch: https://github.com/desmos-labs/desmos/pull/337, please test and feel free to close this issue.
Great comment @aaronc
And no I hadn't looked at cosmos sdk go doc much, as it had not been a very good place to get information about modules. The READMEs, ADRs and digging in the source were all I really had.
I guess this changed over 2020. We can start a re-education campaign to "just read the godoc" 😄
I'm not saying we should switch over to prioritizing godocs, although that's not a bad idea either. I'd just like to have a really clear convention as to which docs need to be prioritized... as it seems like writing thorough godocs for this wasn't that helpful...
It just needs to be consistent and communicated. The devs will follow where the info is, if they know where to look
@anilCSE Thank you for the PR, which indeed fixed the problem. Looks like #8276 is also closed by using that solution.