Lighthouse: Investigate flock-based lock files

Created on 25 Oct 2020  路  4Comments  路  Source: sigp/lighthouse

At the moment we use a hand-rolled implementation of lock files that requires the process to exit cleanly in order for the locks to get cleaned up. I suspect we could achieve better guarantees and better UX using file locking primitives provided by the OS. The most common and widely-supported syscall seems to be flock, which allows a process to lock a file exclusively, releasing it only once the file is closed (which happens regardless of how the process exits). This means we'd move from a paradigm of checking if lock files exist on start-up, to acquiring locks on files that may already exist, and we would no longer need to delete the lock files on shutdown.

There are a few Rust crates providing high-level cross-platform wrappers over syscalls like flock which I think we should investigate:

A1

Most helpful comment

If you do wind up moving to OS locking we'd certainly look into how we can make it work for Teku as well. That would solve our issues with Docker lock files being more brittle (because we can't detect the PID). But yes, I would recommend deleting the lock file on a clean shutdown regardless - keeps things tidy if nothing else.

All 4 comments

For the record, Teku deliberately designed our lock files to be compatible with Lighthouse's so the locking works across clients. We can handle the empty files Lighthouse creates but when we create the lock file we write the process PID to it. When starting if we find a file with a PID file written to it, we consider the lock stale if the PID is no longer active. It's not perfect as the PID could have been reused, and in Docker we can't get the current process's PID so write an empty file but it works out well.

Our first goal however was to use OS locking which is a much more reliable approach. The downside being from java it's hard to know exactly which type of lock is going to be used and which should be used on different platforms.

Not sure how much interoperability really matters here but thought I'd mention it and will keep an eye on this issue for updates.

For the record, Teku deliberately designed our lock files to be compatible with Lighthouse's so the locking works across clients.

Wow, I didn't know this!

We might be able to keep being mostly compatible, in that Lighthouse could delete its lock files in the best case, and use the OS-lock in the same way Teku uses the PID

If you do wind up moving to OS locking we'd certainly look into how we can make it work for Teku as well. That would solve our issues with Docker lock files being more brittle (because we can't detect the PID). But yes, I would recommend deleting the lock file on a clean shutdown regardless - keeps things tidy if nothing else.

Taking this

Was this page helpful?
0 / 5 - 0 ratings

Related issues

plamarque picture plamarque  路  4Comments

paulhauner picture paulhauner  路  3Comments

jrhea picture jrhea  路  4Comments

JustinDrake picture JustinDrake  路  3Comments

paulhauner picture paulhauner  路  4Comments