Geth version: 1.4.6
OS & Version: Linux(Ubuntu:xenial)
I've got two geth nodes joined together on a private test net. Both have a coinbase. I can send transactions from the coinbase on the miner to the coinbase on the client geth. Post 1.4.6 i cannot send them back.
Steps to reproduce:
Presumably there is either a bug or I've mis-setup my setup. But that fact that the txns flow in one direction seems interesting.
I've attached some files that show the problem (dag generation takes a while). They work on a mac with docker-machine started and presume a docker machine ip address of 192.168.99.100 - you might need to change this to localhost on linux (there are command line args for this).
The runTest.sh file assumes that you are in a directory called ethBug to help it find the ip address.
At this point observe that the test (eventually) finishes with a payment back.
Then:
Observe that this does not complete.
You can see this in the logs for 1.4.6:
geth-client_1 | I0701 13:09:53.111635 eth/api.go:1193] Tx(0x155385db87d578a08c5efb8486335da0ee690f72c0d10a95ead23763f4fffa09) to: 0xc64b8fc5796146dd68727e7bc3fba4edd7d30bb2
geth-miner_1 | I0701 13:09:53.854263 miner/worker.go:337] 馃敤 Mined block (#7 / e95015dc). Wait 5 blocks for confirmation
geth-miner_1 | I0701 13:09:53.854502 miner/worker.go:555] commit new work on block 8 with 0 txs & 0 uncles. Took 117.752碌s
geth-miner_1 | I0701 13:09:53.854599 miner/worker.go:433] 馃敤 馃敆 Mined 5 blocks back: block #2
geth-miner_1 | I0701 13:09:53.855311 miner/worker.go:555] commit new work on block 8 with 0 txs & 0 uncles. Took 223.753碌s
geth-client_1 | I0701 13:09:53.865109 core/blockchain.go:964] imported 1 block(s) (0 queued 0 ignored) including 0 txs in 4.61781ms. #7 [e95015dc / e95015dc]
geth-miner_1 | I0701 13:09:55.576549 miner/worker.go:337] 馃敤 Mined block (#8 / 67adc042). Wait 5 blocks for confirmation
geth-miner_1 | I0701 13:09:55.577478 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 847.668碌s
So the txn never seems to be included in the block. The client has the txn as a pending txn. The miner has 0 pending txns.
Presumably the transaction is not propagating from the client to the miner. Anyone have any idea why?
For completeness here is the corresponding part of the 1.4.5 logs:
geth-client_1 | I0701 13:07:45.585119 eth/api.go:1193] Tx(0xd832a7543de23e8d3094555a1e5bd65185d15b4ec7bebe0dde392524f3a8b636) to: 0x1558a5b5ad08b9bcad764e27cff23c860110bc0b
geth-miner_1 | I0701 13:07:45.768951 miner/worker.go:337] 馃敤 Mined block (#7 / f552fdc9). Wait 5 blocks for confirmation
geth-miner_1 | I0701 13:07:45.769838 miner/worker.go:555] commit new work on block 8 with 1 txs & 0 uncles. Took 259.71碌s
geth-miner_1 | I0701 13:07:45.777002 miner/worker.go:433] 馃敤 馃敆 Mined 5 blocks back: block #2
geth-client_1 | I0701 13:07:45.782043 core/blockchain.go:959] imported 1 block(s) (0 queued 0 ignored) including 0 txs in 5.709456ms. #7 [f552fdc9 / f552fdc9]
geth-miner_1 | I0701 13:07:45.783540 miner/worker.go:555] commit new work on block 8 with 1 txs & 0 uncles. Took 6.46318ms
geth-miner_1 | I0701 13:07:48.797639 miner/worker.go:337] 馃敤 Mined block (#8 / 30e63db1). Wait 5 blocks for confirmation
geth-miner_1 | I0701 13:07:48.798662 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 356.817碌s
geth-miner_1 | I0701 13:07:48.799672 miner/worker.go:433] 馃敤 馃敆 Mined 5 blocks back: block #3
geth-miner_1 | I0701 13:07:48.808388 miner/worker.go:555] commit new work on block 9 with 0 txs & 0 uncles. Took 8.576268ms
geth-client_1 | I0701 13:07:48.815356 core/blockchain.go:959] imported 1 block(s) (0 queued 0 ignored) including 1 txs in 8.734131ms. #8 [30e63db1 / 30e63db1]
Thanks v. much :-)
ethBug.zip
I Can confirm this behaviour, any progress? This doesn't make any sense to me, but playing around with the --nodiscover flag seemed to solve some issues for me.
This was actually a feature that behaves in a weird way on private networks with a single miner.
We've noticed that on the main network when new nodes join, they already receive and try to process transactions against their whatever stale state. Since initial sync can take quite some time, new nodes manages to pile up 10s of thousands of transactions, which put an enormous burden on them. Our solution was that the nodes do not accept new transactions from remote nodes until they complete their initial sync cycle (= receive either a long chain from a remote node, or a fresh enough block).
In a private scenario with only 1 miner, the miner actually never completes a sync since it never receives a block from anyone else (as it is the only one minting the blocks). It's an unfortunate corner case we didn't think of.
One solution is to assume the chain synced when a miner starts mining, but the drawback is that people starting a new node with --mine preset could end up in the same sync issue. Another solution was to only accept txs during actual mining (mining is halted during sync), but this results in a single-miner private-network node to stop accepting new txs if mining is stopped (e.g. only mine if txs are present). There's a third (albeit rare) corner case when there is no miner at all, so sync cycle is never assumed complete.
I'm trying to figure out the best solution for this.
I am also facing the same issue in geth 1.4.7.
@karalabe - that's really interesting, thanks. So actually in my case i could work around the issue by mining a bit on each node and letting them sync before switching to a single miner. I will try that and report back later today.
You might ask why we're only using one miner? We are trying to have broadly one block per second and we've found that we get into a lot of trouble with multiple miners and low difficulties so we've switched to one miner for the moment. There are other rather interesting issues in this use case.
Many thanks for the clear explanation about what is going on - that's _really_ helpful.
As per your effective suggestion, mining on a random client node for a block while the miner is not mining makes this problem go away for my test case and our use case. Many thanks.
I just lost two days on this... Thanks @nimmaj for creating this otherwise it could well have been another two. Any developments on an alternate solution?
Is this the sum total of the relevant changes? (Going to revert the changes in my private fork)
https://github.com/ethereum/go-ethereum/commit/ecb8e23e882362fddf46bbf9cf1da4e8e4271fa7
@karalabe Could you add 'Private Network' label please.
done (for all of them)
I just had a strange case presumably related somehow to this issue on a private testnet with three miners. The "main" miner was the only one mining and the other two miners were connected to the main miner but not each other. At first, I observed the behavior above.
Reading the discussion above, I started mining on the two other nodes at which point transactions were indeed passed along to the main node, however, the two other miners happened to be mining faster than the main miner and they were all out of sync in block numbers. The main miner (slowest) was in the 1930's, and the other two were in the 2030's range and 2070's, respectively. None were syncing to the longest chain but were passing on transactions. I verified all were connected correctly by checking admin.peers.
@mjackson001 I remember seeing similar situation.
One solution is to replace the conditional at eth/handler.go#L668:
if atomic.LoadUint32(&pm.synced) == 0 {
with
if (pm.downloader.Synchronising() == true) {.
Its not a perfect fix since some tx's might arrive before downloading starts, and some tx's may occasionally slip through when downloading is interrupted (when a peer is dropped, pm.downloader.Synchronising() returns false until downloading resumes with a different peer). But it should filter most tx's that arrive before syncing is finished.
Just lost 3 hours because of this bug. :weary:
I was having the same issue (see ticket #2980) - it was easy enough to work around this issue by spinning up another mining geth instance on the same machine - less than ideal but definitely solves the issue for me.
@karalabe - really appreciate the help! Thank you!
just ran into the same issue. this looks like a major bug. transaction does not even appear in eth.pendingTransactions at the non-mining machine...
any help?
adding to the previous post:
our use case is one central mining node and a lot of non-mining clients that interact with contracts. i believe PoA schema would fit better, but afaik there is no PoA in geth yet.
so how do we set up the network to allow non-mining clients to send transactions?
I think if you remove these lines:
https://github.com/ethereum/go-ethereum/blob/develop/eth/handler.go#L676-L678
And compile it ONLY for your miner-client on your server, then should working temporary.
+1 We are seeing the same behavior on our private net
I'm using Geth 1.5.5 on a private network with one mining node (M) and a non-mining node (A). On the node A, I have a static-nodes.json file describing the node M.
I'm able to propagate transactions from M->A but not from A->M.
Is my problem related to this issue?
Is there any workaround other than patching the source code and rebuilding a new version of Geth?
Thanks.
As a workaround, I have followed the solution given by @karalabe.
I start 2 miners on my machine and updated the file "static-nodes.json" accordingly.
Now, my non-miner node is now able to send transactions that are processed by one of the miners.
using parity (1.5 release with PoA) did the trick :)
Closing this issue because it's relatively outdated and likely to be fixed. Please open a new issue with any of the 1.5.x versions.
Most helpful comment
This was actually a feature that behaves in a weird way on private networks with a single miner.
We've noticed that on the main network when new nodes join, they already receive and try to process transactions against their whatever stale state. Since initial sync can take quite some time, new nodes manages to pile up 10s of thousands of transactions, which put an enormous burden on them. Our solution was that the nodes do not accept new transactions from remote nodes until they complete their initial sync cycle (= receive either a long chain from a remote node, or a fresh enough block).
In a private scenario with only 1 miner, the miner actually never completes a sync since it never receives a block from anyone else (as it is the only one minting the blocks). It's an unfortunate corner case we didn't think of.
One solution is to assume the chain synced when a miner starts mining, but the drawback is that people starting a new node with
--minepreset could end up in the same sync issue. Another solution was to only accept txs during actual mining (mining is halted during sync), but this results in a single-miner private-network node to stop accepting new txs if mining is stopped (e.g. only mine if txs are present). There's a third (albeit rare) corner case when there is no miner at all, so sync cycle is never assumed complete.I'm trying to figure out the best solution for this.