Describe the bug
when handling 512 transactions to generating a new block, consensus service will stuck and throw "Operation is not valid due to the current state of the object" error.
To Reproduce
Steps to reproduce the behavior:
neo-cli: master 51cd29fbe21abb9e1f17f64e5c6d21bc7decbbb9
neo: master ab4830cae3242e80fd60e7df6c0d9143803b7be3
neo-vm: master be2ac36bf35a3033d828e0ba0630d390599c487d
MillisecondsPerBlock is 15000.
{
"jsonrpc": "2.0",
"method": "sendtoaddress",
"params": ["0x8c23f196d8a1bfd103a9dcb1f9ccf0c611377d3b", "NNa1Fgc82Qoh65TJfViqC92W8Z8AWL9YKA",0.00000001],
"id": 1
}
Expected behavior
consensus nodes are expected to handle massive txs.
Screenshots
It can be seen from the screenshot that when number of txs is 216, consensus will continue, and when number of txs is 512, then error shows.

[18:17:18.052] initialize: height=35 view=0 index=3 role=Primary
[18:17:33.058] timeout: height=35 view=0
[18:17:33.059] send prepare request: height=35 view=0
[18:17:33.189] OnPrepareResponseReceived: height=35 view=0 index=0
[18:17:33.299] OnPrepareResponseReceived: height=35 view=0 index=1
[18:17:33.302] send commit
[18:17:33.880] OnCommitReceived: height=35 view=0 index=1 nc=1 nf=0
[18:17:34.118] OnPrepareResponseReceived: height=35 view=0 index=2
[18:17:34.124] OnCommitReceived: height=35 view=0 index=2 nc=2 nf=0
[18:17:34.137] relay block: height=35 hash=0xa5d4b10cfaf0c6b24f598c975235b15d6399176fb4b0fb1fd41224d43a250103 tx=216
[18:17:34.164] persist block: height=35 hash=0xa5d4b10cfaf0c6b24f598c975235b15d6399176fb4b0fb1fd41224d43a250103 tx=216
[18:17:34.165] initialize: height=36 view=0 index=3 role=Backup
[18:17:49.159] OnPrepareRequestReceived: height=36 view=0 index=0 tx=512
[18:17:49.164] send prepare response
[18:17:49.890] OnPrepareResponseReceived: height=36 view=0 index=1
[18:17:49.892] send commit
[18:17:49.910] OnCommitReceived: height=36 view=0 index=1 nc=1 nf=0
[18:17:50.145] OnCommitReceived: height=36 view=0 index=0 nc=2 nf=0
[18:17:50.150] relay block: height=36 hash=0xb3be2ffab0fd2ac9f46dd8d14e5727d9d6c3f7e739cf72cf2c034d2100cc40cb tx=512
[ERROR][1/14/2020 10:17:50 AM][Thread 0005][akka://NeoSystem/user/$a] Operation is not valid due to the current state of the object.
Cause: System.InvalidOperationException: Operation is not valid due to the current state of the object.
at Neo.Ledger.Blockchain.Persist(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 503
at Neo.Ledger.Blockchain.OnNewBlock(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 333
at Neo.Ledger.Blockchain.OnReceive(Object message) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 477
at Akka.Actor.UntypedActor.Receive(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Actor.ActorCell.ReceiveMessage(Object message)
at Akka.Actor.ActorCell.Invoke(Envelope envelope)
[ERROR][1/14/2020 10:17:50 AM][Thread 0027][akka://NeoSystem/user/$a] Error while creating actor instance of type Neo.Ledger.Blockchain with 2 args: (Neo.NeoSystem,Neo.Plugins.Storage.Store)
Cause: [akka://NeoSystem/user/$a#456841214]: Akka.Actor.PostRestartException: Exception post restart (System.InvalidOperationException)
---> System.TypeLoadException: Error while creating actor instance of type Neo.Ledger.Blockchain with 2 args: (Neo.NeoSystem,Neo.Plugins.Storage.Store)
---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> System.InvalidOperationException: Operation is not valid due to the current state of the object.
at Neo.Ledger.Blockchain..ctor(NeoSystem system, IStore store) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 107
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
at Akka.Actor.Props.ActivatorProducer.Produce()
at Akka.Actor.Props.NewActor()
--- End of inner exception stack trace ---
at Akka.Actor.Props.NewActor()
at Akka.Actor.ActorCell.CreateNewActorInstance()
at Akka.Actor.ActorCell.<>c__DisplayClass109_0.<NewActor>b__0()
at Akka.Actor.ActorCell.UseThreadContext(Action action)
at Akka.Actor.ActorCell.NewActor()
at Akka.Actor.ActorCell.FinishRecreate(Exception cause, ActorBase failedActor)
--- End of inner exception stack trace ---

Platform:
The error it's located here

I will review it
@cloud8little the error appear with only one CN ?
@shargon It happens on all consensus nodes
Error replicated successful, I will try to solve it soon. Thanks for your detailed explanation @cloud8little !
For replicate the error you must have 2 address in the same wallet. One with all the GAS (30m), and the other with 0.
Sometimes the rpc starting to choose the second address as From. We have a problem in the memory pool that produce a negative balance in persist.
Still investigating the fix.
Without Parallel Verification, this doesn't happen @erikzhang.
@shargon As I see from this report, error occurs on at
Neo.Ledger.Blockchain.Persist(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 503, where the code is https://github.com/neo-project/neo/blob/fc1a64b77c8a901c048e7b973f204d3b7c9870e3/src/neo/Ledger/Blockchain.cs#L503
Is there any relationship with VM execution? https://github.com/neo-project/neo/blob/fc1a64b77c8a901c048e7b973f204d3b7c9870e3/src/neo/Ledger/Blockchain.cs#L499-L500
I'm confused with the error location and the cause.
No problems with VM, the main reason is the mempool balance check, which lead to package txs without enough balance. This has happened before.
yes it's while OnPersist>GasToken, the balance becomes negative and throw the exception.
@shargon I reproduced this problem, and found that the problem was on PostExecuteInstruction() in ExecutionEngine. When the number of tx reached 512, the result of ReferenceCounter.CheckZeroReferred () was 2053, which was greater thanMaxStackSize: 2048, which caused the check to fail, and the VM returned Fault.
After enlarging MaxStackSize to a larger number, such as 4096, this problem disappeared in my machine.
@shargon @eryeer was right, the root cause is ReferenceCounter.CheckZeroReferred() <= MaxStackSize return false, and leads to VM execute state as FAULT.

After I changed MaxStackSize as 2 * 2 * 1024, the CN service will recover. and no exception throws.
Initial balance of the sender: 100,000 gas
Here is the result of changing MaxStackSize to 2 * 2 * 1024.


Great! I will review it. But maybe we have two errors...
@eryeer this dosen't solve the main issue, where are more than 2k items?


Also there are another issue... related to this one I will reopen my PR
@eryeer @cloud8little if you try to replicate this issue with multiple threads (I used 5), you will get another error produced by parallel verification
We have two problems here:
1- With 512 Tx, Gas contract produce 4 items per notification, and we raise the maximum numbers of stack items (FAULT), this is the initial problem described by @cloud8little
2- With multiple threads, we have a double spend in GAS native contract, this is the second problem find by me while testing the same issue with multiple threads. We should reconsider the paralell verification.

Maybe we should consider to rollback the snapshot, and FAULT all txs if in OnPersist in native contracts was FAULT, otherwise we open a door for Denial Of Service. @erikzhang

According to the issue 1, I prefer to skip ReferenceCounter if the Trigger is System
protected override bool PostExecuteInstruction(Instruction instruction)
{
if (Trigger == TriggerType.System) return true;
return base.PostExecuteInstruction(instruction);
}
I created new issue for the second problem https://github.com/neo-project/neo/issues/1420
According to the issue 1, I prefer to skip
ReferenceCounterif the Trigger isSystem
It's a good idea to skip ReferenceCounter for System Trigger.
Maybe we should consider to rollback the snapshot, and
FAULTall txs if inOnPersistin native contracts was FAULT, otherwise we open a door for Denial Of Service. @erikzhang
Agree, it's better to close the door which we may not know.
Most helpful comment
According to the issue 1, I prefer to skip
ReferenceCounterif the Trigger isSystem