Go-ethereum: Ethereum transactions apparently not re-broadcast

Created on 29 Jul 2020 · 30Comments · Source: ethereum/go-ethereum

Yesterday morning we got ourselves into a situation where there were a large number of transactions (with later nonces) held up behind a transaction that had too low gas price. At the time of investigation, none of these transactions appeared on etherescan as pending.

The gas price was updated on the transaction that had too low gas price, and it was very quickly mined and confirmed onto a block.

At this point the transactions that had previously been stuck had a high enough gas price, and didn't need re-writing, yet they didn't get mined, and continued to not appear on etherscan. The fact that neither etherscan nor miners knew about these makes me assume that geth is not periodically re-broadcasting transactions, and because of high transaction volume, other nodes had decided to drop the transactions as they couldn't be mined (due to the fact that the earlier transaction had too low gas price).

I was expecting geth to re-broadcast the transaction and for them to get mined.

A node restart caused it to re-broadcast the pending transactions, and they were mined within minutes of the restart

System information

Geth version: 1.9.16-stable
OS & Version: Linux Ubuntu 18.04
Commit hash : ea3b00ad75aebaf1790fe0f8afc9fb7852c87716

Expected behaviour

Geth re-broadcasts transactions with later nonces when a transaction with an earlier nonce appears on the blockchain

Actual behaviour

Apparently no re-broadcasting, or at least not in a time-frame that works for us (minutes)

Steps to reproduce the behaviour

Send a transaction with a low gas price
Wait for network congestion
Send a transaction with a high enough gas price
Observe that neither transaction appears on etherscan and other nodes (they have dropped these transactions)
Re-write the transaction sent at stop 1 (same nonce) with a high enough gas price
Observe the newly re-written transaction get mined
Observe that the transaction sent at step 3 does not get mined.

I am wondering if there is any way to get geth to re-broadcast a transaction in its pendingTransaction list.

This is very similar to https://github.com/ethereum/go-ethereum/issues/14669 - and is the same infrastructure that @vogelito reported there - although the script to copy pending Transactions from geth -> parity is no longer active in our environment.

more-information

Source

CrispinFlowerday

👍2

Most helpful comment

Are you really sure this should be closed? How do you expect people to use geth if you can't broadcast TXs with it? The only way seems to have a parity node next to the geth node and use it to broadcast TXs?

We had some issues with TXs not being broadcasted when using 1.9.18 and the network was not stable (fees sky rocketing, chain split of 11/12/2020, etc...) but now we just updated to 1.9.24 and this seems to happen a lot more.

Have to restart geth too often to get the TX broadcasted, this is not really manageable when having a lot of TXs to send.

I'd be happy to help. All transactions stack up on txpool.content.pending. geth recognize the generated TXID. We have the line starting with Submitted... on logs. The txid doesn't show up in the mempool of any block explorer.

I'll try to add some debug statements in the source code to try to pinpoint issue next time it happens, if you have ideas where to add some, please do tell. Thanks.

coinwalletdev on 13 Nov 2020

👍3 👀1

All 30 comments

This is a correct assessment. There is no rebroadcast in Geth. However usually when a new peer connection is made, all txs are exchanged so unless you have a super stable connection, the transactions should sill be leaking out. That said, it's not ideal, and it definitely makes it hard for transactions to flow across the network once initially broadcast. I'm open to suggestions on how we could best do this without spamming the network with txs over and over again. Definitely a good problem to think about.

karalabe on 30 Jul 2020

My suggestion would be to perform selective broadcasts in the following situations:
1) When a transaction with nonce = X is confirmed on the blockchain, broadcast pending transaction with nonce = X + 1
2) Periodically, e.g. every 15 mins, broadcast transactions in the pending list with the lowest nonce for each sending account.

Granted, this may spam the network (particularly the second case) when the node is handling lots of sending accounts. In that case, perhaps both these cases could be amended to only do the broadcast if the gasPrice of the transaction about to be broadcast is within, say, 90% of the gasPrice that geth would use for a new transaction. This would avoid spamming with low gas price transactions.

Alternatively, or potentially in addition, it would be very helpful to have an RPC call that could force a broadcast - the current strategies we have are either: Wait for a new peer, or restart the node - the former is problematic as we have no idea how long that could be, and the latter is problematic because its a manual process which is unscalable.

CrispinFlowerday on 30 Jul 2020

I can confirm I am having the same issue regularly.
I am sending lots of transaction daily I need to restart the geth like once every 3 days.

farukterzioglu on 10 Aug 2020

This issue causes lots of problem on our side, when there are some pending txs on network, I try to override it, but new txs won’t be broadcasted.
The only solution for me is to restart the node. When will this problem fix.

TurgutBtc on 2 Sep 2020

👍1

This isn't a problem on a somewhat large network, where peers enter and drop off regularly, but only for small stable networks. I don't think we want to add a "generic 15 m rebroadcast" mechanism, since it'll increase the eth traffic humongously. Not sure if there's anything we can do for small networks. For example, adding rpc endpoint "rebroadCastTxs" would just be misused by users who try to solve some unrelated problem, and again, increase network spam.

holiman on 3 Sep 2020

@TurgutBtc are you on mainnet or a private network?

holiman on 3 Sep 2020

So one idea would be to resend transactions that depend on a transaction that was updated. So we rebroadcast every transaction with a higher nonce, if the tx with the lowest nonce gets updated. But this could also be used to spam.

MariusVanDerWijden on 3 Sep 2020

@holiman I'm on mainnet. But it's really problem for us.

TurgutBtc on 3 Sep 2020

👍1

@TurgutBtc could you elaborate a bit on your usecase? I don't think the repro is as simple as making a couple of transactions, I suspect that in order to hit this, a quite large volume of transactions are needed.

So I'm basically wondering,

What are the rate at which txs are sent to the node,
And do all those txs come from the same account,
And what are the various txpool.xx - queue-limits on the receiving node?

I guess that all txs are kept locally, and saved to disk, hence why a reboot solves the problem. What's curious is why it doesn't trickle the transactions out 'organically' when peers are rotated. So, are there any particular peering constraints, or does it use the default number of peers?

holiman on 3 Sep 2020

@holiman
1) Each 5 sec. we are sending 1 tx. For example yesterday we spent totally 1320 txs.
2) Yes all txs are from same account.
3) How can i check txpool.xx - queue-limits ?

TurgutKanceltik on 3 Sep 2020

How can i check txpool.xx - queue-limits ?

Unless you specify some specific options in your commandline, the defautls are used (geth -h | grep txpool)

We need to implement a better disk-based txpool-backend to better handle transactions. We don't have any good solution currently for this problem.

holiman on 17 Sep 2020

We've discussed this a bit further, but at least for now my personal belief is that rebroadcasting will never land because it's too easy to abuse and a successful attack could have catastrophic consequences (network wide amplification attack and self DDoS). The one solution I see working is to get nodes to keep the original data instead of dropping them on the floor, but for that we need a disk backend for the tx pool. There the catch is the performance hit of SPAM, because disk io is already out vulnerable point. We're pondering about how to implement it, current best idea (initial) is a leveldb database where txs are sorted by gas price and indexed by hash and account+nonce. Not sure how well this performs, and even then, we'd need some gas price cutoffs to drop stuff that's too cheap. Maybe that could be done by having a larger limit on the tx count (e.g. 4K in memory, 400K on disk). These are ideas that need a bit of exploring and prototyping.

karalabe on 17 Sep 2020

Is this issue resolved and closed?

farukterzioglu on 22 Sep 2020

👍1

Yes, is this issue fixed?

TurgutKanceltik on 22 Sep 2020

No, there is no clear way to fix it. It would take quite some research and it is not a priority at the moment.

adamschmideg on 22 Sep 2020

Have to restart geth too often to get the TX broadcasted, this is not really manageable when having a lot of TXs to send.

I'll try to add some debug statements in the source code to try to pinpoint issue next time it happens, if you have ideas where to add some, please do tell. Thanks.

coinwalletdev on 13 Nov 2020

👍3 👀1

@coinwalletdev Honestly at this point geth is just a mess.. Probably better long term to just move infrastructure over to OpenEthereum or Besu.

AtomicLemon on 13 Nov 2020

@Gbogdann93 @coinwalletdev Please don't close issues just because you can not fix it. There may be somebody else who can. This one is not difficult. It is real problem on main net. Real for business.

PavelNiedoba on 23 Nov 2020

@PavelNiedoba i didn't close my previous issue, for me issue is no more present but it looks for others the issue is still there. Will add another comment if will face same problem.

Gbogdann93 on 23 Nov 2020

This one is not difficult.

Ah yes, when you have no clue but t's easy to slap on a label and make it someone else's problem.

I've explained that this is hard. Actually, it is very hard to do in a way that won't allow someone to attack the network. In 2015 I've spent Xmas trying to prevent the network from going down exactly due to one of these rebroadcast issues. The network almost imploded under the load of that super light usage way way way back. Imagine what would happen now.

Rebroadcasting will never land (unless someone derives a miracle solution). The alternative is persistent txpools, but that has a ton of challenges around disk access so that we don't make chain processing even worse than it is now. It requires figuring out a good solution, prototyping, experimenting, etc.

karalabe on 23 Nov 2020

I understand that rebroadcasting anything anytime is problem. But if I create bump fee transaction, this one needs to be broadcast which does not happen, which renders the the functionality useless. And I don't understand why it's very hard when restarting geth will do the broadcast?

PavelNiedoba on 23 Nov 2020

New transactions should most definitely be broadcast. Any diagnostic to back up the claim that it is not?

Restarting Geth will connect to a new set of peers and will push any local transactions out, so that's kind of like a nuclear option.

Still, it should most definitely broadcast anything newly signed, please provide some details as to why you think it does not.

karalabe on 23 Nov 2020

Still, it should most definitely broadcast anything newly signed, please provide some details as to why you think it does not.

geth creates a TX with a nonce N. For some reason, this TX never gets into the mempool of other nodes. We/geth does not know that. We then create TXs with nonce of value N+k for any k>0. Those TXs never get broadcasted to other node mempools.

Once this happens you can:

use the nuclear option of restarting geth: this works but reduce service availability to our customers
rebroadcast the TX with nonce N not using geth:
-- if the network is stable (if this does not happen again), then geth will hopefully automatically rebroadcast all TXs with a nonce N+k with k>0 once your TX of nonce N is in everybody's mempool
-- if the network is not stable yet, then good luck you have to rebroadcast all TXs with a nonce N+k with k>=0.

Note that in the hypothetical funky case that geth is the only ETH implementation available, the _nuclear option (restarting)_ seems to actually be the only available option...

To fix this, assuming that at some point the network will be stable, why not rebroadcast a TX with a nonce N if we can't broadcast a TX with a nonce N+1. This will mess up with the (re)broadcasting timing but at least fix up this issue for now, no?

(N.B: i now understand better infura post mortem - if you have to make an internal fix to geth just to get a TX broadcasted, no wonder it's a pain to update the versions)

coinwalletdev on 24 Nov 2020

Two things here:

But if I create bump fee transaction, this one needs to be broadcast which does not happen, which renders the the functionality useless.

and

For some reason, this TX never gets into the mempool of other nodes. We/geth does not know that. We then create TXs with nonce of value N+k for any k>0. Those TXs never get broadcasted to other node mempools.

These both seems to be highly different from the original report, and it would be better to open separate tickets for them, instead of posting on an already closed ticket

holiman on 24 Nov 2020

But if I create bump fee transaction, this one needs to be broadcast which does not happen, which renders the the functionality useless.

When trying to broadcast a TX with the same nonce but a different TXID, geth broadcast it well 99.9% of the time - the 0.1% of the time when geth thinks the TX is broadcasted and it is actually not broadcasted seems to be the root cause of the issue, no?

These both seems to be highly different from the original report, and it would be better to open separate tickets for them, instead of posting on an already closed ticket

The core issue here is that geth doesn't successfully broadcast all TXs 100% of the time.
My example with the nonce was to show @karalabe that if one TX doesn't get rebroadcasted then on the same address all following TXs with higher nonce don't get broadcasted - hence geth definitely broadcast _nothing_ newly signed after this issue happens, effectively blocking broadcasting of any new transaction.

As @CrispinFlowerday put it:

Yesterday morning we got ourselves into a situation where there were a large number of transactions (with later nonces) held up behind a transaction that had too low gas price. At the time of investigation, none of these transactions appeared on etherescan as pending.

This is exactly what I'm describing - geth doesn't brodcast TXs when the network is in bad shape (fees sky rocketing, chain splits, etc..)

At this point the transactions that had previously been stuck had a high enough gas price, and didn't need re-writing, yet they didn't get mined, and continued to not appear on etherscan

This is what I described with the sentence -- if the network is not stable yet, then good luck you have to rebroadcast all TXs with a nonce N+k with k>=0. (except he changed the TXID for the case k=0). The above proposed fix would just solve this issue though, so indeed this is not perfect.

I'd be happy to create another issue for that but there is already a lot of closed issue with what it seems exactly the same problem: https://github.com/ethereum/go-ethereum/issues/21167 https://github.com/ethereum/go-ethereum/issues/14669 - should I really create a new one? I chose this issue as it seemed to offer the most detailed explanation about its root.

coinwalletdev on 24 Nov 2020

At simplecoin we send eth to clients from one account. If gas price rise rapidly we end up with chain of unconfirmed tx. If the chain gets long enough, it gets dropped (unknown by ethscan). I can do replace by fee and my geth will create tx wit new hash and same nonce, but ethscan will not show it. It's not 0.1% problem. It's maybe 5% problem of the times we need to bum fee. Sending eth from different address/wallet is not option too because we saw tx confirmed even when dropped by etherscan. We using super high fee to cope with this for now.

PavelNiedoba on 24 Nov 2020

It's not 0.1% problem. It's maybe 5% problem of the times we need to bum fee.

I guess it depends of the nodes you are connected too and the geth version you are running. Indeed I can't define precisely for the moment what I mean by 'the network is in bad shape' and this is what is lacking to reproduce the issue..

We using super high fee to cope with this for now.

Next time this happens, could you try to rebroadcast the same TX with an identical TXID to show the problem is unrelated to fee bumping?

Get the blocking TX in hex of the least nonce via (assuming 0xAA..AA is the blocked address):

$ geth attach
> eth.getRawTransaction(txpool.content.pending['0xAA..AA'][Object.keys(txpool.content.pending['0xAA.AA'])[0]]['hash'])
0x.....

Then broadcast the TX using a non geth node (or online via https://etherscan.io/pushTx). This should make the TX appears on etherscan and geth should recognize it (and if you're lucky broadcast TXs with higher nonces)

coinwalletdev on 25 Nov 2020

We faced this not rebroadcasting issue recently.

For security reason we run public node separately, while the private nodes (behind the NAT) are used to broadcast TXs from internal signer.

I check now that each node have only 16 peers, all of them are outgoing because of private connection of the node. So our configuration fit into definition of very stable network

My question is how to enforce private node to open more peers? Any way to make kind of reset connected peers instead of "nuclear" restart.

I am not so comfortable to post into the closed issue. Is there any open issue, where we can track of the progress?

begetan on 26 Nov 2020

👍2

this not rebroadcasting issue caused a lot of mess. The transaction is stored on our local node, but the rest of the world just don't have my transaction.