Tendermint: e2e: make node clocks configurable

Created on 20 Jan 2021 · 1Comment · Source: tendermint/tendermint

Problem Definition

Running the e2e suite locally means that the clocks of nodes are relatively in sync.

A snapshot of the commits gives us a glimpse of the level of disparity:

validator02    |     Signatures:
validator02    |       CommitSig{B2993382335A by 32DC06149F04 on 2 @ 2021-01-20T13:43:44.604544614Z}
validator02    |       CommitSig{0B891951D6DD by 983D127556DC on 2 @ 2021-01-20T13:43:44.333703285Z}
validator02    |       CommitSig{98122B7C0203 by 07EB27AE0D06 on 2 @ 2021-01-20T13:43:44.514243607Z}
validator02    |       CommitSig{4E09695D5585 by 75C02D9AC4DB on 2 @ 2021-01-20T13:43:44.330662417Z}
validator02    |       CommitSig{40BE8F79373D by 94004332DBB9 on 2 @ 2021-01-20T13:43:44.397453349Z}

We may want to inject some invariance in the clocks to test how well the network handles asynchronicity.

Proposal

There are a few different angles we can approach this:

Start the containers with different clocks. (global offset)
Introduce some minimal randomized delay to imitate latency over wider geographies (don't know how feasible this is likely to be) (incremental offset)

I'm not so familiar with this side of distributed systems so perhaps someone with greater expertise might want to weigh in on possible solutions (Maybe Jepsen testing is a solid enough tool for this).

For Admin Use

[ ] Not duplicate issue
[ ] Appropriate labels applied
[ ] Appropriate contributors tagged
[ ] Contributor assigned/self-assigned

proposal test

Source

cmwaters

👍2

Most helpful comment

The time differences are just caused by network latency, all Docker containers use the same clock since they run under the same kernel instance. This can be overridden by injecting a fake time into the C library via e.g. libfaketime (see https://brendonmatheson.com/2020/08/27/manipulating-time-inside-a-docker-container.html).

Network latency can be introduced with standard Linux tooling such as e.g. tc (see https://medium.com/@kazushi/simulate-high-latency-network-using-docker-containerand-tc-commands-a3e503ea4307). I wouldn't involve Jepsen, since that's really a whole test framework (i.e. it would replace the entire E2E suite) -- unless we feel like these scenarios should be covered by chaos testing instead of E2E testing, which is a somewhat different technique.