Whenever a process is restarted it should be handled without failures. Also, throw an error for failing to handle the process.
[inf] 2019-04-02 06:28:00 | Verify->verifyBlock succeeded for block 18359457909549872079 at height 7903574.
[inf] 2019-04-02 06:28:03 | Releasing enqueued broadcasts
[inf] 2019-04-02 06:28:03 | Queue empty
[inf] 2019-04-02 06:28:08 | Releasing enqueued broadcasts
[inf] 2019-04-02 06:28:08 | Queue empty
[inf] 2019-04-02 06:28:09 | Cleanup controller...
[inf] 2019-04-02 06:28:09 | Cleaning chain...
[inf] 2019-04-02 06:28:09 | Disconnecting
[inf] 2019-04-02 06:28:09 | Cleaning HTTP API...
[inf] 2019-04-02 06:28:09 | Disconnecting
[inf] 2019-04-02 06:28:37 | Starting the app - testnet-7000 - Lisk Framework(0.1.0)
[inf] 2019-04-02 06:28:37 | Initializing controller
[inf] 2019-04-02 06:28:37 | Loading controller
[ERR] 2019-04-02 06:28:37 | An instance of application "testnet-7000" is already running. You have to change application name to run another instance.
[inf] 2019-04-02 06:28:54 | Starting the app - testnet-7000 - Lisk Framework(0.1.0)
[inf] 2019-04-02 06:28:54 | Initializing controller
[inf] 2019-04-02 06:28:54 | Loading controller
[ERR] 2019-04-02 06:28:54 | An instance of application "testnet-7000" is already running. You have to change application name to run another instance.
[inf] 2019-04-02 06:29:12 | Starting the app - testnet-7000 - Lisk Framework(0.1.0)
[inf] 2019-04-02 06:29:12 | Initializing controller
Process failed and it wasn't handled properly.
This happened on a upgraded delegate on testnet, cause of issue is known, as there were no logs
1.6.0, Docker
This feature is working fine as per expected the issue just with the file logging.
When we run the other instance of the app and see the console logs we clear and handled error response.
[inf] 2019-04-04 09:17:36 | Starting the app - devnet-4000 - Lisk Framework(0.1.0)
[inf] 2019-04-04 09:17:36 | Initializing controller
[inf] 2019-04-04 09:17:36 | Loading controller
[ERR] 2019-04-04 09:17:37 | An instance of application "devnet-4000" is already running. You have to change application name to run another instance.
[ERR] 2019-04-04 09:17:37 | App stopped with error - Duplicate app instance for "devnet-4000"
[inf] 2019-04-04 09:17:37 | Cleanup controller...
[ERR] 2019-04-04 09:17:37 | Caused error during cleanup - {}
But because each started instance of the app use the same log file, so messages from different instances mixed up.
[dbg] 2019-04-04 09:17:36 | Block already forged for the current slot
[inf] 2019-04-04 09:17:37 | Releasing enqueued broadcasts
[inf] 2019-04-04 09:17:37 | Queue empty
[ERR] 2019-04-04 09:17:37 | An instance of application "devnet-4000" is already running. You have to change application name to run another instance.
[dbg] 2019-04-04 09:17:37 | Broadhash consensus: 0 %
[dbg] 2019-04-04 09:17:37 | Transaction pool size: 0
[dbg] 2019-04-04 09:17:37 | Block already forged for the current slot
The only thing that need to be debugged is one missing error log.
https://github.com/LiskHQ/lisk/blob/f2fce700c164e915621071e7726e40eb3c460594/src/index.js#L45-L51
This error message is logged in console but missed in the file log we need to investigate it further.
The other workaround would be to add the pid tag in the log message to identify if log entries were done from different app instances. That will be a breaking change in logs, so to be done in major release.
Possible solution by adding a check to PID number which needs to be different when restarting:
async _validatePidFile() {
const pidPath = `${this.config.dirs.pids}/controller.pid`;
const pidExists = await fs.pathExists(pidPath);
if (pidExists) {
const pid = parseInt(await fs.readFile(pidPath));
const pidRunning = await isPidRunning(pid);
this.logger.info(`Old PID: ${pid}`);
this.logger.info(`Current PID: ${process.pid}`);
if (pidRunning && pid !== process.pid) {
this.logger.error(
`An instance of application "${
this.appLabel
}" is already running. You have to change application name to run another instance.`
);
throw new DuplicateAppInstanceError(this.appLabel, pidPath);
}
}
await fs.writeFile(pidPath, process.pid);
}
Most helpful comment
The other workaround would be to add the
pidtag in the log message to identify if log entries were done from different app instances. That will be a breaking change in logs, so to be done in major release.