Orleans: Start silo fails with System.IndexOutOfRangeException on 2.1.0

Created on 17 Sep 2018  路  12Comments  路  Source: dotnet/orleans

Environments:

  • Windows 10 1803
  • Visual Studio 2017 15.9.0 Preview 2.0
  • .NET Core SDK 2.1.402
  • Orleans 2.1.0-rc1

Issue:

Silo failed to start with error:

info: Orleans.Runtime.Silo[100452]
Start Incoming message agents took 33 Milliseconds to finish
info: Orleans.Threading.ThreadPoolThread[0]
Starting thread Runtime.Messaging.IncomingMessageAgent/Application0 on managed thread 22
info: Orleans.Runtime.GrainDirectory.LocalGrainDirectory[0]
Start
info: Runtime.GrainDirectory.AdaptiveDirectoryCacheMaintainer1[0] Starting AsyncAgent Runtime.GrainDirectory.AdaptiveDirectoryCacheMaintainer1 on managed thread 5
info: Runtime.GrainDirectory.GlobalSingleInstanceActivationMaintainer[0]
Starting AsyncAgent Runtime.GrainDirectory.GlobalSingleInstanceActivationMaintainer on managed thread 5
info: Orleans.Threading.ThreadPoolThread[0]
Starting thread Runtime.GrainDirectory.AdaptiveDirectoryCacheMaintainer`10 on managed thread 23
info: Orleans.Runtime.Silo[100452]
Start local grain directory took 13 Milliseconds to finish
info: Orleans.Threading.ThreadPoolThread[0]
Starting thread Runtime.GrainDirectory.GlobalSingleInstanceActivationMaintainer0 on managed thread 24
info: Orleans.Runtime.Silo[100452]
Init implicit stream subscribe table took 11 Milliseconds to finish
info: Orleans.Runtime.Silo[100452]
Create system targets and inject dependencies took 27 Milliseconds to finish
info: Orleans.Runtime.SiloLifecycleSubject[100452]
Lifecycle observer Orleans.Runtime.Silo started in stage 4000 which took 118 Milliseconds.
info: Orleans.Runtime.SiloLifecycleSubject[100452]
Starting lifecycle stage 4000 took 118.1442 Milliseconds
info: Orleans.Runtime.Catalog[100507]
Before collection#1: memory=6MB, #activations=0, collector=<#Activations=0, #Buckets=0, buckets=[]>.
info: Orleans.Runtime.Catalog[100508]
After collection#1: memory=6MB, #activations=0, collected 0 activations, collector=<#Activations=0, #Buckets=0, buckets=[]>, collection time=00:00:00.0137324.
info: Orleans.Runtime.Silo[100452]
Init grain services took 1 Milliseconds to finish
info: Orleans.Runtime.MembershipService.MembershipOracleData[100603]
MembershipOracle starting on host = szf-sl address = S127.0.0.1:11111:274874211 at 2018-09-17 09:56:51.881 GMT, backOffMax = 00:00:02
info: Orleans.Runtime.MembershipService.SystemTargetBasedMembershipTable[100635]
Creating in-memory membership table
info: Orleans.Runtime.MembershipService.MembershipTableSystemTarget[100637]
GrainBasedMembershipTable Activated.
fail: Orleans.Runtime.Messaging.IncomingMessageAcceptor[101017]
Exception trying to process 198 bytes from endpoint 127.0.0.1:63001
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Orleans.Runtime.MessagingStatisticsGroup.OnMessageReceive(Message msg, Int32 headerBytes, Int32 bodyBytes) in D:buildagent_work23ssrcOrleans.CoreStatisticsMessagingStatisticsGroup.cs:line 192
at Orleans.Runtime.IncomingMessageBuffer.TryDecodeMessage(Message& msg) in D:buildagent_work23ssrcOrleans.CoreMessagingIncomingMessageBuffer.cs:line 197
at Orleans.Runtime.Messaging.IncomingMessageAcceptor.ReceiveCallbackContext.ProcessReceived(SocketAsyncEventArgs e) in D:buildagent_work23ssrcOrleans.RuntimeMessagingIncomingMessageAcceptor.cs:line 659
fail: Orleans.Runtime.Messaging.IncomingMessageAcceptor[101027]
ProcessReceivedBuffer exception with RemoteEndPoint 127.0.0.1:63001:
System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Orleans.Runtime.MessagingStatisticsGroup.OnMessageReceive(Message msg, Int32 headerBytes, Int32 bodyBytes) in D:buildagent_work23ssrcOrleans.CoreStatisticsMessagingStatisticsGroup.cs:line 192
at Orleans.Runtime.IncomingMessageBuffer.TryDecodeMessage(Message& msg) in D:buildagent_work23ssrcOrleans.CoreMessagingIncomingMessageBuffer.cs:line 197
at Orleans.Runtime.Messaging.IncomingMessageAcceptor.ReceiveCallbackContext.ProcessReceived(SocketAsyncEventArgs e) in D:buildagent_work23ssrcOrleans.RuntimeMessagingIncomingMessageAcceptor.cs:line 659
at Orleans.Runtime.Messaging.IncomingMessageAcceptor.ProcessReceive(SocketAsyncEventArgs e) in D:buildagent_work23ssrcOrleans.RuntimeMessagingIncomingMessageAcceptor.cs:line 489

It happens on all 2.1.0 versions(Including beta1,rc1 and ci builds). And there's no such error if I directly reference the Orleans.Server source project instead of the package Microsoft.Orleans.Server .

documentation

Most helpful comment

Try turning off TieredCompilation. Looks like it is breaking serialization somehow.

All 12 comments

Can you confirm that all Orleans packages you reference in your projects have the same version? Can you share your silo config/startup code?

@sergeybykov As here https://github.com/csyszf/orleans/tree/%234990/Samples/2.0/HelloWorld
Just the HelloWorld sample with 2.1.0-rc1 packages.

@sergeybykov Ah... It's not a 2.1.0 issue, On 2.1.0 those exceptions are thrown when silo starting, and on 2.0.4, they'll be thrown when client try to connect Silos.

And it seems more about my local development environment. It work's fine on other systems, whether Linux or Windows.

I've tried to clean NuGet caches but not help.

Did you check that all projects reference same versions of Orleans and other NuGet packages?

@onionhammer provided a repro - https://github.com/onionhammer/PoC.Orleans. We are looking into it.

Try turning off TieredCompilation. Looks like it is breaking serialization somehow.

Strange, wonder why that is. BTW tiered compilation will be enabled by default in netcore 2.2 or 3.0 I believe, so this will be a big issue

Found an environment variable COMPlus_TieredCompilation=1 in my system. Remove it and the silo works fine.
So it is the tiered compilation's problem. Thanks a lot, @sergeybykov @onionhammer

We suspect it might be a JIT bug, but not 100% sure yet. Continuing investigation.

We'll document that TieredCompilation should not be turned on for now. Will reevaluate when the CLR issue is fixed.

This has been fixed in .NET Core 2.2.

Was this page helpful?
0 / 5 - 0 ratings