We found some transactions will get dropped after being submitted. After some investigation, I think it's related to block missing and switching forks.
For example, trx 178b89082cedec6fdb067eaf886ca5b542797e7b2b611d86ee756773620537cc was dropped. The related block is 14549333. From the following log, you can find that there are two blocks with the same block num.

Another case is that one producer missed 12 blocks in one round. For example, trx f72f3a8b85c30fc5baf217a22e7b45ab3a1639a48358012dc9b23c38f804bf9e was dropped with block number 14664301. Please check the log below:

There are quite a few transactions that were dropped. Some related block number: 14549333, 14636103, 14642368, 14662929, 14664301, 14681455, ...
+1
same issue here.
+1
some producer was offline at that time, resulting in 6 sec (12 blocks) missing.
The first case, producers are all online.
@taokayan do you mean it's by design that a transition will be dropped if "switching forks" happens?
it鈥榮 weird, EOS should be designed to tolerate the forking without losing transaction.
Can we improve this ?
This is the contract account that having the transactions being dropped: https://www.myeoskit.com/#/tx/pandafuncode
The action types that having dropped transactions include pray, refund, buytea, upgrade.
One more case cb57b4bfdbc73c3d501e2746756112583999b4b96c09c1d907199de02b34c519 from https://www.myeoskit.com/#/tx/pandafuncode and the actions is buyPanda
The system is designed to expire transactions. Expired transactions are dropped. If block production is unreliable, this may result in an increase in dropping transactions. Even if block production is completely normal, a busy system can drop transactions if there are too many to be incorporated into a block within the time allotted. This is a denial of service attack mitigation feature.
The system is working as designed.
as far as I know, the default expire time is 30 seconds, that's should be enough to broadcast into most nodes, there is no reason for denial the transaction for 5 times (30/6 = 5) @jgiszczak
Actually many dapps set transaction expire time to 60 seconds, and transaction drop still happened.
@taokayan could you please check this issue again, especially winlin's replay. The transactions being dropped here have a default expire time, 30s.
One thing I would really want to confirm is that if a block is missing, would the related transactions being dropped? If not, then please reopen this issue.
@DebugFuture @winlin There's no guarantee that one transaction executes successfully in one node implies it will be executed successfully on the other nodes. Every transaction can be failed in either objective way(exception, assert failed...), or subjective way(cpu, net exhausted...). And it will not be broadcast if it fails subjectively. Therefore, expire time has nothing to do with whether the transaction has been broadcast, nor whether it is included in some block. Expire time simply means if any of the BPs try to execute the transaction after its expire time, it will guarantee to be failed.
To check whether transaction x is included inside the blockchain, user should:
1) push transaction at time t.
2) wait for its expire time (let say 30s)
3) wait for around 300 blocks to make sure one block after the expire time has already become irreversible
4) query the transaction again, or check all the blocks from time t to time t+30
@taokayan Thanks for your explain.
For the trx got dropped, it successfully executed in the target node as we got the response with trx id and block num. I know it doesn't mean it will go to lib 100%. But I checked a lot of logs, not only the two I posted. Almost every time a trx is dropped, there is a "switching forks" or a whole round missing.
We asked Bart Wyatt, and he says 'switching forks' won't cause a trx being dropped. So we just want to make sure if it works as expected. And you mean the missing trx has nothing to do with 'switching forks' right?
switching forks may cause transaction drop in some block, however it will unlikely be totally solved by increasing the expire time. As there is a high correlation between fork switching and resource exhaustion, it's hard to tell at that time whether an unapplied transaction will be applied again successfully, or it will be subjectively failed and forgotten. We need to keep track of LIB number to determine whether a transaction is indeed included in the blockchain.
thanks @taokayan , just as you say, we did need a method to get more debug info.
If you know how to let us know, and we can help
FYI, the https://kylin-push-guarantee.eoscanada.com endpoint (in beta), blocks until the transaction makes it into a block, and returns the block mum and block is in the response (along with the standard response fields)
We have another endpoint that blocks until lib barrier passes, and returns the irreversible block that included the TRX along with the response. Poke me if you'd like to try that one.
@abourget Thanks and great work!
For the first api, does it have the same result if submitting the trx first and then get the block num by get transaction?
For the second api, it's not quite user friendly for a game to wait for more than 2 minutes to get response. Currently we just keep the record, check the lib later and send a makeup trx if the previous got dropped.
answered by @jgiszczak. Please ask in eosio.stackexchange.com if you have further questions.
Most helpful comment
as far as I know, the default expire time is 30 seconds, that's should be enough to broadcast into most nodes, there is no reason for denial the transaction for 5 times (30/6 = 5) @jgiszczak