Go-ethereum: Transaction propagation issue on private test net since geth 1.4.6

Created on 1 Jul 2016 · 21Comments · Source: ethereum/go-ethereum

System information

Geth version: 1.4.6
OS & Version: Linux(Ubuntu:xenial)

Synopsis

I've got two geth nodes joined together on a private test net. Both have a coinbase. I can send transactions from the coinbase on the miner to the coinbase on the client geth. Post 1.4.6 i cannot send them back.

Steps to reproduce:

bring up two geths, create coinbase on each, unlock accounts
join them together using admin_addPeer
start mining on one node - wait for it to have some ether
send a transaction of, say, 4 ether from the miner coinbase to the non-miner coinbase
observe that this succeeds
once the non-miner coinbase has a balance, send some ether back
observe that on 1.4.5 this works fine. on 1.4.6 this transaction is never included

Presumably there is either a bug or I've mis-setup my setup. But that fact that the txns flow in one direction seems interesting.

I've attached some files that show the problem (dag generation takes a while). They work on a mac with docker-machine started and presume a docker machine ip address of 192.168.99.100 - you might need to change this to localhost on linux (there are command line args for this).

The runTest.sh file assumes that you are in a directory called ethBug to help it find the ip address.

docker-compose -f geth-docker-1.4.5 build
docker-compose -f geth-docker-1.4.5 up
npm install (i'm using node5, but presumably node6 will work)
./runTest.sh

At this point observe that the test (eventually) finishes with a payment back.

Then:

ctrl-c the running docker compose (once)
docker-compose -f geth-docker-1.4.5 down
docker-compose -f geth-docker-1.4.6 build
docker-compose -f geth-docker-1.4.6 up
./runTest.sh

Observe that this does not complete.

You can see this in the logs for 1.4.6:

geth-client_1  | I0701 13:09:53.111635 eth/api.go:1193] Tx(0x155385db87d578a08c5efb8486335da0ee690f72c0d10a95ead23763f4fffa09) to: 0xc64b8fc5796146dd68727e7bc3fba4edd7d30bb2
geth-miner_1   | I0701 13:09:53.854263 miner/worker.go:337] 🔨  Mined block (#7 / e95015dc). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:09:53.854502 miner/worker.go:555] commit new work on block 8 with 0 txs & 0 uncles. Took 117.752µs
geth-miner_1   | I0701 13:09:53.854599 miner/worker.go:433] 🔨 🔗  Mined 5 blocks back: block #2
geth-miner_1   | I0701 13:09:53.855311 miner/worker.go:555] commit new work on block 8 with 0 txs & 0 uncles. Took 223.753µs
geth-client_1  | I0701 13:09:53.865109 core/blockchain.go:964] imported 1 block(s) (0 queued 0 ignored) including 0 txs in 4.61781ms. #7 [e95015dc / e95015dc]
geth-miner_1   | I0701 13:09:55.576549 miner/worker.go:337] 🔨  Mined block (#8 / 67adc042). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:09:55.577478 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 847.668µs

So the txn never seems to be included in the block. The client has the txn as a pending txn. The miner has 0 pending txns.

Presumably the transaction is not propagating from the client to the miner. Anyone have any idea why?

For completeness here is the corresponding part of the 1.4.5 logs:

geth-client_1  | I0701 13:07:45.585119 eth/api.go:1193] Tx(0xd832a7543de23e8d3094555a1e5bd65185d15b4ec7bebe0dde392524f3a8b636) to: 0x1558a5b5ad08b9bcad764e27cff23c860110bc0b
geth-miner_1   | I0701 13:07:45.768951 miner/worker.go:337] 🔨  Mined block (#7 / f552fdc9). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:07:45.769838 miner/worker.go:555] commit new work on block 8 with 1 txs & 0 uncles. Took 259.71µs
geth-miner_1   | I0701 13:07:45.777002 miner/worker.go:433] 🔨 🔗  Mined 5 blocks back: block #2
geth-client_1  | I0701 13:07:45.782043 core/blockchain.go:959] imported 1 block(s) (0 queued 0 ignored) including 0 txs in 5.709456ms. #7 [f552fdc9 / f552fdc9]
geth-miner_1   | I0701 13:07:45.783540 miner/worker.go:555] commit new work on block 8 with 1 txs & 0 uncles. Took 6.46318ms
geth-miner_1   | I0701 13:07:48.797639 miner/worker.go:337] 🔨  Mined block (#8 / 30e63db1). Wait 5 blocks for confirmation
geth-miner_1   | I0701 13:07:48.798662 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 356.817µs
geth-miner_1   | I0701 13:07:48.799672 miner/worker.go:433] 🔨 🔗  Mined 5 blocks back: block #3
geth-miner_1   | I0701 13:07:48.808388 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 8.576268ms
geth-client_1  | I0701 13:07:48.815356 core/blockchain.go:959] imported 1 block(s) (0 queued 0 ignored) including 1 txs in 8.734131ms. #8 [30e63db1 / 30e63db1]

Thanks v. much :-)
ethBug.zip

Source

nimmaj

👍1

Most helpful comment

This was actually a feature that behaves in a weird way on private networks with a single miner.

We've noticed that on the main network when new nodes join, they already receive and try to process transactions against their whatever stale state. Since initial sync can take quite some time, new nodes manages to pile up 10s of thousands of transactions, which put an enormous burden on them. Our solution was that the nodes do not accept new transactions from remote nodes until they complete their initial sync cycle (= receive either a long chain from a remote node, or a fresh enough block).

In a private scenario with only 1 miner, the miner actually never completes a sync since it never receives a block from anyone else (as it is the only one minting the blocks). It's an unfortunate corner case we didn't think of.

One solution is to assume the chain synced when a miner starts mining, but the drawback is that people starting a new node with --mine preset could end up in the same sync issue. Another solution was to only accept txs during actual mining (mining is halted during sync), but this results in a single-miner private-network node to stop accepting new txs if mining is stopped (e.g. only mine if txs are present). There's a third (albeit rare) corner case when there is no miner at all, so sync cycle is never assumed complete.

I'm trying to figure out the best solution for this.

karalabe on 4 Jul 2016

👍9 🚀1 ❤1 🎉1

All 21 comments

I Can confirm this behaviour, any progress? This doesn't make any sense to me, but playing around with the --nodiscover flag seemed to solve some issues for me.

coeniebeyers on 4 Jul 2016

👍1

This was actually a feature that behaves in a weird way on private networks with a single miner.

I'm trying to figure out the best solution for this.

karalabe on 4 Jul 2016

👍9 🚀1 ❤1 🎉1

I am also facing the same issue in geth 1.4.7.

aman-c on 4 Jul 2016

@karalabe - that's really interesting, thanks. So actually in my case i could work around the issue by mining a bit on each node and letting them sync before switching to a single miner. I will try that and report back later today.

You might ask why we're only using one miner? We are trying to have broadly one block per second and we've found that we get into a lot of trouble with multiple miners and low difficulties so we've switched to one miner for the moment. There are other rather interesting issues in this use case.

Many thanks for the clear explanation about what is going on - that's _really_ helpful.

nimmaj on 4 Jul 2016

As per your effective suggestion, mining on a random client node for a block while the miner is not mining makes this problem go away for my test case and our use case. Many thanks.

nimmaj on 4 Jul 2016

I just lost two days on this... Thanks @nimmaj for creating this otherwise it could well have been another two. Any developments on an alternate solution?

Is this the sum total of the relevant changes? (Going to revert the changes in my private fork)

https://github.com/ethereum/go-ethereum/commit/ecb8e23e882362fddf46bbf9cf1da4e8e4271fa7

dan-turner on 5 Jul 2016

@karalabe Could you add 'Private Network' label please.

wawrzek on 6 Jul 2016

done (for all of them)

fjl on 7 Jul 2016

I just had a strange case presumably related somehow to this issue on a private testnet with three miners. The "main" miner was the only one mining and the other two miners were connected to the main miner but not each other. At first, I observed the behavior above.

Reading the discussion above, I started mining on the two other nodes at which point transactions were indeed passed along to the main node, however, the two other miners happened to be mining faster than the main miner and they were all out of sync in block numbers. The main miner (slowest) was in the 1930's, and the other two were in the 2030's range and 2070's, respectively. None were syncing to the longest chain but were passing on transactions. I verified all were connected correctly by checking admin.peers.

mjackson001 on 27 Jul 2016

@mjackson001 I remember seeing similar situation.

wawrzek on 15 Aug 2016

One solution is to replace the conditional at eth/handler.go#L668:

if atomic.LoadUint32(&pm.synced) == 0 {

with

if (pm.downloader.Synchronising() == true) {.

Its not a perfect fix since some tx's might arrive before downloading starts, and some tx's may occasionally slip through when downloading is interrupted (when a peer is dropped, pm.downloader.Synchronising() returns false until downloading resumes with a different peer). But it should filter most tx's that arrive before syncing is finished.

cdetrio on 24 Aug 2016

Just lost 3 hours because of this bug. :weary:

ethernomad on 30 Aug 2016

I was having the same issue (see ticket #2980) - it was easy enough to work around this issue by spinning up another mining geth instance on the same machine - less than ideal but definitely solves the issue for me.

@karalabe - really appreciate the help! Thank you!

joeb000 on 6 Sep 2016

just ran into the same issue. this looks like a major bug. transaction does not even appear in eth.pendingTransactions at the non-mining machine...
any help?

randomnerd on 17 Oct 2016

adding to the previous post:
our use case is one central mining node and a lot of non-mining clients that interact with contracts. i believe PoA schema would fit better, but afaik there is no PoA in geth yet.
so how do we set up the network to allow non-mining clients to send transactions?

randomnerd on 17 Oct 2016

I think if you remove these lines:
https://github.com/ethereum/go-ethereum/blob/develop/eth/handler.go#L676-L678

And compile it ONLY for your miner-client on your server, then should working temporary.

iFA88 on 24 Oct 2016

+1 We are seeing the same behavior on our private net

mattcrooks on 1 Nov 2016

I'm using Geth 1.5.5 on a private network with one mining node (M) and a non-mining node (A). On the node A, I have a static-nodes.json file describing the node M.
I'm able to propagate transactions from M->A but not from A->M.
Is my problem related to this issue?
Is there any workaround other than patching the source code and rebuilding a new version of Geth?
Thanks.

eloudsa on 19 Dec 2016

👍2

As a workaround, I have followed the solution given by @karalabe.
I start 2 miners on my machine and updated the file "static-nodes.json" accordingly.
Now, my non-miner node is now able to send transactions that are processed by one of the miners.

eloudsa on 20 Dec 2016

👍2

using parity (1.5 release with PoA) did the trick :)

randomnerd on 31 Jan 2017

Closing this issue because it's relatively outdated and likely to be fixed. Please open a new issue with any of the 1.5.x versions.