Blockchains in general are at best eventually consistent. That is a client reading from two services could expect that one services is behind the other. This becomes problematic when implementing clients that intentionally or not connect to multiple services. Since a single IP address or URL may refer to a single service or several, clients /must/ be implemented in such a way that read after read consistency is not guaranteed. Specifically, consider a read returns a version:
Is entirely possible in the case where there may be redundancy behind a single domain. And even more likely in our fault tolerance case of n upstreams choose 1. Why?
The primary issue is a race against state synchronization:
Validator_i -> VFN_i -> FN_i
vs
Validator_j -> VFN_j -> FN_j
where the json-rpc service is supported by (FN_i, FN_j). if we assume i != j, then they can take different routes and have different latencies, even temporarily unavailable connections. A client calling into an application load balanced json-rpc service hosted by (FN_i, FN_j) may get different versions in these cases.
Our current json-rpc redundancy approach is to query multiple services, assuming even single homed services, it is possible that our most up to date goes offline causing us to redirect to an older version.
In any case, if we arrive at an older version, we fail. So three ideas:
Does JSON-RPC currently support long poll?
https://github.com/diem/diem/blob/master/json-rpc/docs/client_implementation_guide.md#error-handling
current plan is client implementation should:
And all clients implemented in Python, golang, java SDKs retry on stale response. Plus the new Rust async client.
For the 3 options you listed.
No.2 is chosen because retry is always needed. If client side implements well, the error raised after retry should also contain the stale response for caller to judge whether it wants to propagate the error, or use the staled response.
No.3 is not good, because in most of cases we just want to retry, so return the stale response will cause caller to write retry.
No.1 maybe better for server resource usage, can be an additional option on top of No.2.
So basically you like 1+2+3!
Most helpful comment
For the 3 options you listed.
No.2 is chosen because retry is always needed. If client side implements well, the error raised after retry should also contain the stale response for caller to judge whether it wants to propagate the error, or use the staled response.
No.3 is not good, because in most of cases we just want to retry, so return the stale response will cause caller to write retry.
No.1 maybe better for server resource usage, can be an additional option on top of No.2.