We started using the time-based gas price strategies in raiden. They seems to work well in general but also create excessive load on the node: https://github.com/raiden-network/raiden/issues/2522
While investigating this I tried adding the different available caching middlewares, but for some reason (I guess it's locking related as the nodes just lock up) I only can use the time_based_cache_middleware.
Do you have any hints for improving this specific behaviour? As a temporary solution I cache the result of the gas price strategy call for now, but in the long term this should be properly fixed.
Have you thought about a specific caching middleware that only caches calls to getBlock, as this seems to be the main culprit!?
@palango can you link me to how you've setup the caching middlewares? I believe that with all three of them in place the load should be significantly lower but that depends on what your API usage looks like. If there is a relatively consistent stream of requests for the estimated gas price then the cache should stay warm and the requests needed for each call should be limited to checking latest block hash and pulling latest blocks if they aren't cached.
See the bottom of this documentation page for how to install all three of the various cache middlewares
@pipermerriam This is the code that sets up the middlewares. https://github.com/raiden-network/raiden/blob/d989738e5f8840748cdd8a02aafa6df4a4ab1409/raiden/network/rpc/client.py#L139-L158
But for some reason the Raiden client dead-locks if I add latest_block_based_cache_middleware and/or simple_cache_middleware.
Edit:
If there is a relatively consistent stream of requests for the estimated gas price then the cache should stay warm and the requests needed for each call should be limited to checking latest block hash and pulling latest blocks if they aren't cached.
This should happen for testing, but under normal circumstances on-chain transactions should be quite rare.
if you are available, we can chat synchronously about this on gitter right now.
I suspect the lockup has something to do with the threading.Lock that we uses: https://github.com/ethereum/web3.py/blob/master/web3/middleware/cache.py#L367
Are you able to reproduce the deadlock or is it something that only manifests within the raiden codebase? My first thought is that it likely has to do with some kind of concurrency which is not playing nicely with web3.
Diagnosing the deadlock can probably be done by adding logging statements within the various caching middlewares and observing where they lock up.
This should happen for testing, but under normal circumstances on-chain transactions should be quite rare.
If you expect to only rarely need to query a gas price, but need to response to be reasonably fast, I think it may be prudent to run a dedicated background gas price "service" which maybe has a dedicated Web3 connection and maintains a warm cache by periodically polling the gas price every N blocks. That could in theory also side-step the dead-lock issue since that service would be operating using a dedicated web3 instance and in theory shouldn't be subject to an concurrency complications.
Fairly tangential, but I have been kicking around an idea that may address the same core problem. I'm thinking of writing a daemon that monitors your pending transactions and bumps the gas price over time until the transaction gets included. For your case @palango , would that daemon be more/equally/less desirable than using a smoothly-working gas price estimator?
I have been procrastinating on writing the daemon because I'm not sure anyone would use it.
Are you able to reproduce the deadlock or is it something that only manifests within the raiden codebase?
Right now I don't have a small reproducer, but I pushed a branch showing the locking to my repo: https://github.com/palango/raiden/tree/web3-locking
After setting Raiden up you can reproduce this by running raiden --transport=udp smoketest. I hope I have time next week to narrow it down a bit.
would that daemon be more/equally/less desirable than using a smoothly-working gas price estimator?
I guess for our use case a gas price estimator is the preferable solution as it is more important to have the transaction mined in a certain timeframe than having it as cheap as possible. I'm also not sure if we want to run another service besides raiden itself. (@LefterisJP any additions here?)
@LefterisJP any additions here?
Hey guys. Why is another daemon needed and not simply a greenlet inside Raiden? Just to avoid the concurrency issue mentioned above?
as it is more important to have the transaction mined in a certain timeframe than having it as cheap as possible
Exactly that. We only do transactions on-chain rather rarely. But when we do, we need them mined in a certain time-frame or risk loss of funds.
Why is another daemon needed and not simply a greenlet inside Raiden? Just to avoid the concurrency issue mentioned above?
As long as the greenlet can take care of keeping the cache warm, then yeah, I think the deadlock is the only issue with using this gas estimation in Raiden.
But when we do, we need them mined in a certain time-frame or risk loss of funds.
How short a window do they need to be mined in?
FWIW, I'm not sure I'd trust any gas estimation algorithm, depending on how much is at risk. There is always a chance of getting a spike in gas price around the time of broadcast.
I'd mirror what @carver said. If it is critical that a transaction be mined within a certain time window I'd recommend the daemon approach that @carver suggested (potentially combined with a slightly modified version of the gas price strategy algorithms to actually figure out an appropriate price escalation).
But related to this specific issue, I'd like to understand why the deadlock is happening.
@pipermerriam I found some time to debug this and found the problem:
The simple cache by default caches calls to eth_getTransactionByHash. But internally we use this call to poll for the inclusion of the tx in a block, so the caching turns that loop into an endless loop.
Not sure how to proceed from here, for now I implemented a cache that only caches eth_getBlockByHash and this works fine. But ideally caching a unmined transaction shouldn't happen.
That's a bug that should be fixed, the simple cache should be updated to not cache unmined transactions.
With #1086 filed we can close this.