Yesterday morning we got ourselves into a situation where there were a large number of transactions (with later nonces) held up behind a transaction that had too low gas price. At the time of investigation, none of these transactions appeared on etherescan as pending.
The gas price was updated on the transaction that had too low gas price, and it was very quickly mined and confirmed onto a block.
At this point the transactions that had previously been stuck had a high enough gas price, and didn't need re-writing, yet they didn't get mined, and continued to not appear on etherscan. The fact that neither etherscan nor miners knew about these makes me assume that geth is not periodically re-broadcasting transactions, and because of high transaction volume, other nodes had decided to drop the transactions as they couldn't be mined (due to the fact that the earlier transaction had too low gas price).
I was expecting geth to re-broadcast the transaction and for them to get mined.
A node restart caused it to re-broadcast the pending transactions, and they were mined within minutes of the restart
Geth version: 1.9.16-stable
OS & Version: Linux Ubuntu 18.04
Commit hash : ea3b00ad75aebaf1790fe0f8afc9fb7852c87716
Geth re-broadcasts transactions with later nonces when a transaction with an earlier nonce appears on the blockchain
Apparently no re-broadcasting, or at least not in a time-frame that works for us (minutes)
I am wondering if there is any way to get geth to re-broadcast a transaction in its pendingTransaction list.
This is very similar to https://github.com/ethereum/go-ethereum/issues/14669 - and is the same infrastructure that @vogelito reported there - although the script to copy pending Transactions from geth -> parity is no longer active in our environment.
This is a correct assessment. There is no rebroadcast in Geth. However usually when a new peer connection is made, all txs are exchanged so unless you have a super stable connection, the transactions should sill be leaking out. That said, it's not ideal, and it definitely makes it hard for transactions to flow across the network once initially broadcast. I'm open to suggestions on how we could best do this without spamming the network with txs over and over again. Definitely a good problem to think about.
My suggestion would be to perform selective broadcasts in the following situations:
1) When a transaction with nonce = X is confirmed on the blockchain, broadcast pending transaction with nonce = X + 1
2) Periodically, e.g. every 15 mins, broadcast transactions in the pending list with the lowest nonce for each sending account.
Granted, this may spam the network (particularly the second case) when the node is handling lots of sending accounts. In that case, perhaps both these cases could be amended to only do the broadcast if the gasPrice of the transaction about to be broadcast is within, say, 90% of the gasPrice that geth would use for a new transaction. This would avoid spamming with low gas price transactions.
Alternatively, or potentially in addition, it would be very helpful to have an RPC call that could force a broadcast - the current strategies we have are either: Wait for a new peer, or restart the node - the former is problematic as we have no idea how long that could be, and the latter is problematic because its a manual process which is unscalable.
I can confirm I am having the same issue regularly.
I am sending lots of transaction daily I need to restart the geth like once every 3 days.
This issue causes lots of problem on our side, when there are some pending txs on network, I try to override it, but new txs won鈥檛 be broadcasted.
The only solution for me is to restart the node. When will this problem fix.
This isn't a problem on a somewhat large network, where peers enter and drop off regularly, but only for small stable networks. I don't think we want to add a "generic 15 m rebroadcast" mechanism, since it'll increase the eth traffic humongously. Not sure if there's anything we can do for small networks. For example, adding rpc endpoint "rebroadCastTxs" would just be misused by users who try to solve some unrelated problem, and again, increase network spam.
@TurgutBtc are you on mainnet or a private network?
So one idea would be to resend transactions that depend on a transaction that was updated. So we rebroadcast every transaction with a higher nonce, if the tx with the lowest nonce gets updated. But this could also be used to spam.
@holiman I'm on mainnet. But it's really problem for us.
@TurgutBtc could you elaborate a bit on your usecase? I don't think the repro is as simple as making a couple of transactions, I suspect that in order to hit this, a quite large volume of transactions are needed.
So I'm basically wondering,
txpool.xx - queue-limits on the receiving node? I guess that all txs are kept locally, and saved to disk, hence why a reboot solves the problem. What's curious is why it doesn't trickle the transactions out 'organically' when peers are rotated. So, are there any particular peering constraints, or does it use the default number of peers?
@holiman
1) Each 5 sec. we are sending 1 tx. For example yesterday we spent totally 1320 txs.
2) Yes all txs are from same account.
3) How can i check txpool.xx - queue-limits ?
How can i check txpool.xx - queue-limits ?
Unless you specify some specific options in your commandline, the defautls are used (geth -h | grep txpool)
We need to implement a better disk-based txpool-backend to better handle transactions. We don't have any good solution currently for this problem.
We've discussed this a bit further, but at least for now my personal belief is that rebroadcasting will never land because it's too easy to abuse and a successful attack could have catastrophic consequences (network wide amplification attack and self DDoS). The one solution I see working is to get nodes to keep the original data instead of dropping them on the floor, but for that we need a disk backend for the tx pool. There the catch is the performance hit of SPAM, because disk io is already out vulnerable point. We're pondering about how to implement it, current best idea (initial) is a leveldb database where txs are sorted by gas price and indexed by hash and account+nonce. Not sure how well this performs, and even then, we'd need some gas price cutoffs to drop stuff that's too cheap. Maybe that could be done by having a larger limit on the tx count (e.g. 4K in memory, 400K on disk). These are ideas that need a bit of exploring and prototyping.
Is this issue resolved and closed?
Yes, is this issue fixed?
No, there is no clear way to fix it. It would take quite some research and it is not a priority at the moment.
Are you really sure this should be closed? How do you expect people to use geth if you can't broadcast TXs with it? The only way seems to have a parity node next to the geth node and use it to broadcast TXs?
We had some issues with TXs not being broadcasted when using 1.9.18 and the network was not stable (fees sky rocketing, chain split of 11/12/2020, etc...) but now we just updated to 1.9.24 and this seems to happen a lot more.
Have to restart geth too often to get the TX broadcasted, this is not really manageable when having a lot of TXs to send.
I'd be happy to help. All transactions stack up on txpool.content.pending. geth recognize the generated TXID. We have the line starting with Submitted... on logs. The txid doesn't show up in the mempool of any block explorer.
I'll try to add some debug statements in the source code to try to pinpoint issue next time it happens, if you have ideas where to add some, please do tell. Thanks.
@coinwalletdev Honestly at this point geth is just a mess.. Probably better long term to just move infrastructure over to OpenEthereum or Besu.
@Gbogdann93 @coinwalletdev Please don't close issues just because you can not fix it. There may be somebody else who can. This one is not difficult. It is real problem on main net. Real for business.
@PavelNiedoba i didn't close my previous issue, for me issue is no more present but it looks for others the issue is still there. Will add another comment if will face same problem.
This one is not difficult.
Ah yes, when you have no clue but t's easy to slap on a label and make it someone else's problem.
I've explained that this is hard. Actually, it is very hard to do in a way that won't allow someone to attack the network. In 2015 I've spent Xmas trying to prevent the network from going down exactly due to one of these rebroadcast issues. The network almost imploded under the load of that super light usage way way way back. Imagine what would happen now.
Rebroadcasting will never land (unless someone derives a miracle solution). The alternative is persistent txpools, but that has a ton of challenges around disk access so that we don't make chain processing even worse than it is now. It requires figuring out a good solution, prototyping, experimenting, etc.
I understand that rebroadcasting anything anytime is problem. But if I create bump fee transaction, this one needs to be broadcast which does not happen, which renders the the functionality useless. And I don't understand why it's very hard when restarting geth will do the broadcast?
New transactions should most definitely be broadcast. Any diagnostic to back up the claim that it is not?
Restarting Geth will connect to a new set of peers and will push any local transactions out, so that's kind of like a nuclear option.
Still, it should most definitely broadcast anything newly signed, please provide some details as to why you think it does not.
Still, it should most definitely broadcast anything newly signed, please provide some details as to why you think it does not.
geth creates a TX with a nonce N. For some reason, this TX never gets into the mempool of other nodes. We/geth does not know that. We then create TXs with nonce of value N+k for any k>0. Those TXs never get broadcasted to other node mempools.
Once this happens you can:
N not using geth:N+k with k>0 once your TX of nonce N is in everybody's mempoolN+k with k>=0.Note that in the hypothetical funky case that geth is the only ETH implementation available, the _nuclear option (restarting)_ seems to actually be the only available option...
To fix this, assuming that at some point the network will be stable, why not rebroadcast a TX with a nonce N if we can't broadcast a TX with a nonce N+1. This will mess up with the (re)broadcasting timing but at least fix up this issue for now, no?
(N.B: i now understand better infura post mortem - if you have to make an internal fix to geth just to get a TX broadcasted, no wonder it's a pain to update the versions)
Two things here:
But if I create bump fee transaction, this one needs to be broadcast which does not happen, which renders the the functionality useless.
and
For some reason, this TX never gets into the mempool of other nodes. We/geth does not know that. We then create TXs with nonce of value N+k for any k>0. Those TXs never get broadcasted to other node mempools.
These both seems to be highly different from the original report, and it would be better to open separate tickets for them, instead of posting on an already closed ticket
But if I create bump fee transaction, this one needs to be broadcast which does not happen, which renders the the functionality useless.
When trying to broadcast a TX with the same nonce but a different TXID, geth broadcast it well 99.9% of the time - the 0.1% of the time when geth thinks the TX is broadcasted and it is actually not broadcasted seems to be the root cause of the issue, no?
These both seems to be highly different from the original report, and it would be better to open separate tickets for them, instead of posting on an already closed ticket
The core issue here is that geth doesn't successfully broadcast all TXs 100% of the time.
My example with the nonce was to show @karalabe that if one TX doesn't get rebroadcasted then on the same address all following TXs with higher nonce don't get broadcasted - hence geth definitely broadcast _nothing_ newly signed after this issue happens, effectively blocking broadcasting of any new transaction.
As @CrispinFlowerday put it:
Yesterday morning we got ourselves into a situation where there were a large number of transactions (with later nonces) held up behind a transaction that had too low gas price. At the time of investigation, none of these transactions appeared on etherescan as pending.
This is exactly what I'm describing - geth doesn't brodcast TXs when the network is in bad shape (fees sky rocketing, chain splits, etc..)
At this point the transactions that had previously been stuck had a high enough gas price, and didn't need re-writing, yet they didn't get mined, and continued to not appear on etherscan
This is what I described with the sentence -- if the network is not stable yet, then good luck you have to rebroadcast all TXs with a nonce N+k with k>=0. (except he changed the TXID for the case k=0). The above proposed fix would just solve this issue though, so indeed this is not perfect.
I'd be happy to create another issue for that but there is already a lot of closed issue with what it seems exactly the same problem: https://github.com/ethereum/go-ethereum/issues/21167 https://github.com/ethereum/go-ethereum/issues/14669 - should I really create a new one? I chose this issue as it seemed to offer the most detailed explanation about its root.
At simplecoin we send eth to clients from one account. If gas price rise rapidly we end up with chain of unconfirmed tx. If the chain gets long enough, it gets dropped (unknown by ethscan). I can do replace by fee and my geth will create tx wit new hash and same nonce, but ethscan will not show it. It's not 0.1% problem. It's maybe 5% problem of the times we need to bum fee. Sending eth from different address/wallet is not option too because we saw tx confirmed even when dropped by etherscan. We using super high fee to cope with this for now.
It's not 0.1% problem. It's maybe 5% problem of the times we need to bum fee.
I guess it depends of the nodes you are connected too and the geth version you are running. Indeed I can't define precisely for the moment what I mean by 'the network is in bad shape' and this is what is lacking to reproduce the issue..
We using super high fee to cope with this for now.
Next time this happens, could you try to rebroadcast the same TX with an identical TXID to show the problem is unrelated to fee bumping?
Get the blocking TX in hex of the least nonce via (assuming 0xAA..AA is the blocked address):
$ geth attach
> eth.getRawTransaction(txpool.content.pending['0xAA..AA'][Object.keys(txpool.content.pending['0xAA.AA'])[0]]['hash'])
0x.....
Then broadcast the TX using a non geth node (or online via https://etherscan.io/pushTx). This should make the TX appears on etherscan and geth should recognize it (and if you're lucky broadcast TXs with higher nonces)
We faced this not rebroadcasting issue recently.
For security reason we run public node separately, while the private nodes (behind the NAT) are used to broadcast TXs from internal signer.
I check now that each node have only 16 peers, all of them are outgoing because of private connection of the node. So our configuration fit into definition of very stable network
My question is how to enforce private node to open more peers? Any way to make kind of reset connected peers instead of "nuclear" restart.
I am not so comfortable to post into the closed issue. Is there any open issue, where we can track of the progress?
this not rebroadcasting issue caused a lot of mess. The transaction is stored on our local node, but the rest of the world just don't have my transaction.
As addendum, we are constantly having this issue and the only solution we found is to restart the node, then it broadcasts all stuck transactions.
Most helpful comment
Are you really sure this should be closed? How do you expect people to use geth if you can't broadcast TXs with it? The only way seems to have a parity node next to the geth node and use it to broadcast TXs?
We had some issues with TXs not being broadcasted when using 1.9.18 and the network was not stable (fees sky rocketing, chain split of 11/12/2020, etc...) but now we just updated to 1.9.24 and this seems to happen a lot more.
Have to restart geth too often to get the TX broadcasted, this is not really manageable when having a lot of TXs to send.
I'd be happy to help. All transactions stack up on
txpool.content.pending.gethrecognize the generated TXID. We have the line starting withSubmitted...on logs. The txid doesn't show up in the mempool of any block explorer.I'll try to add some debug statements in the source code to try to pinpoint issue next time it happens, if you have ideas where to add some, please do tell. Thanks.