Neo4j: 3.3.1 causal cluster fails with misleading error on startup from 3.0.4 backup despite being a supported upgrade path

Created on 26 Jan 2018  路  4Comments  路  Source: neo4j/neo4j

Sorry for the long title, feel free to rename :)

The upgrade docs for 3.3 state that 3.0.any -> 3.3.1 is a supported path, but mentions no caveats. Additionally, the 3.3 Upgrade FAQ says upgrading any 3.x version will be very straightforward, with no mention of caveats.

However, I was unable to upgrade from 3.0.4 to 3.3.1 using the following steps and received several error messages, none of which were indicative of the underlying problem.

These steps were executed within a docker environment, using the 3.0.4-enterprise tag and the 3.3.1-enterprise tag.

Steps to reproduce

  1. In a fresh 3.0.4 enterprise container, insert some data and take a supported, online backup using these instructions.
  2. Duplicate that backup into 3 directories.
  3. Start a causal cluster of 3 nodes, mounting the directories in step 2 into /data of each container
  4. Wait for cluster members to acknowledge eachother.

Expected behavior
Neo4j upgrades database and runs as a causal cluster

Actual behavior
The database fails startup with the following error message:

...
Caused by: org.neo4j.kernel.impl.storemigration.StoreUpgrader$DatabaseNotCleanlyShutDownException: The database is not cleanly shutdown. The database needs recovery, in order to recover the database, please run the old version of the database on this store.
    at org.neo4j.kernel.impl.storemigration.UpgradableDatabase.checkUpgradeable(UpgradableDatabase.java:124)
    at org.neo4j.kernel.impl.storemigration.StoreUpgrader.migrateIfNeeded(StoreUpgrader.java:132)
    at org.neo4j.kernel.impl.storemigration.DatabaseMigrator.migrate(DatabaseMigrator.java:101)
    at org.neo4j.kernel.NeoStoreDataSource.upgradeStore(NeoStoreDataSource.java:573)
    at org.neo4j.kernel.NeoStoreDataSource.start(NeoStoreDataSource.java:435)
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)

Sometimes, and I wasn't able to isolate the conditions, the error message would instead be something like:

ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@f88bfbe' was successfully initialized, but failed to start. Please see the attached cause exception "Unable to find transaction 23289 in any of my logical logs: Couldn't find any log containing 23289". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@f88bfbe' was successfully initialized, but failed to start. Please see the attached cause exception "Unable to find transaction 23289 in any of my logical logs: Couldn't find any log containing 23289".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@f88bfbe' was successfully initialized, but failed to start. Please see the attached cause exception "Unable to find transaction 23289 in any of my logical logs: Couldn't find any log containing 23289".

I was able to start up the causal cluster as expected without the volume mount.

Fix/Workaround
I was able to successfully start the causal cluster if I first upgraded the 3.0.4 backup by starting a 3.3.1 node in single instance mode and letting it perform the upgrade, then using that upgraded DB as the seed for the causal cluster.

It seem like that step could be an acceptable upgrade path, but it should be well documented, and perhaps a descriptive error message should be added in the next patch release.

team-cluster team-kernel

All 4 comments

This is intended behaviour and you are doing the correct "workaround". I guess "please run the old version of the database on this store" could be slightly clearer. Database here refers to the database software. I will bring the feedback back to the team though.

Thanks for your reply @martinfurmanski.

This may be intended behavior as you understand it, but that means there is a glaring omission in the documentation. The links in the initial post show that the documentation does not make any mention of exceptions to the supported upgrade path, however, this is definitely an exception. Can you link to me any official documentation that discusses this workaround or of errors that may arise when upgrading in this manner? If none exists, I'm happy to file an issue for improvement to documentation, but I don't believe that this issue is resolved until some actionable changes are outlined.

Thanks!

@freethejazz Thank you for your feedback! I have added notes in the documentation that an upgrade must be performed as an action separate from, for example, seeding a cluster. The updated docs will be published within a week.

Awesome, thanks for an updating and adding details to the documentation!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mammadalipour picture mammadalipour  路  3Comments

sgehrig picture sgehrig  路  4Comments

webtic picture webtic  路  4Comments

freeeve picture freeeve  路  4Comments

szarnyasg picture szarnyasg  路  4Comments