Orleans: Improve error message when membership provider generates consistent failures

Created on 6 Dec 2016  Â·  13Comments  Â·  Source: dotnet/orleans

follow this doc Using Consul as a Membership Provider

  1. Start consul success with result:
    D:\Program Files\consul>Consul.exe agent -server -bootstrap -data-dir "C:\Consul
    \Data" -client=127.0.0.1 -bind=127.0.0.1 -ui
    ==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
    ==> Starting Consul agent...
    ==> Starting Consul agent RPC...
    ==> Consul agent running!
    Version: 'v0.7.1'
    Node name: 'win测试机'
    Datacenter: 'dc1'
    Server: true (bootstrap: true)
    Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
    Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
    Atlas:
  2. Config silo and start it, the exe pause on _siloHost.StartOrleansSilo() method, and the console show
    [2016-12-06 10:16:36.810 GMT 15 INFO 100603 MembershipOracle 127.0.0.1:22222] MembershipOracle starting on host = win测试机 address = S127.0.0.1:22222:218715348 at 2016-12-06 10:16:36.711 GMT, backOffMax = 00:00:20
  3. Wait 5 minutes, error occured:
    2016-12-06-10.03.29.066ZZ
    ERROR starting Orleans silo name=win测试机 Exception=
    Exc level 0: System.AggregateException: One or more errors occurred.
    at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
    at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
    at System.Threading.Tasks.Task.Wait(TimeSpan timeout)
    at Orleans.OrleansTaskExtentions.WaitWithThrow(Task task, TimeSpan timeout)
    at Orleans.Runtime.Silo.DoStart()
    at Orleans.Runtime.Silo.Start()
    at Orleans.Runtime.Host.SiloHost.StartOrleansSilo(Boolean catchExceptions)
    Exc level 1: System.TimeoutException: ExecuteWithRetries has exceeded its max execution time of 00:05:00. Now is 2016-12-06 10:03:28.844 GMT, started at 2016-12-06 09:58:27.936 GMT, passed 00:05:00.9073926
    at Orleans.AsyncExecutorWithRetries.d__5`1.MoveNext()
    --- End of stack trace from previous location where exception was thrown ---

Environment: os win7 sp1, Orleans 1.3.1, consul 0.7.1, vs2015 community update3
Thanks

enhancement help wanted

All 13 comments

I suspect the silo is unable to communicate with Consul for some reason, e.g. the membership provider for Consul isn't properly configured. Aren't there other info/warning/error statements in the log?

OrleansConfiguration.xml

<OrleansConfiguration xmlns="urn:orleans"> <Globals> <SystemStore SystemStoreType="None" DataConnectionString="http://localhost:8500" DeploymentId="MyOrleansDeployment" /> </Globals> <Defaults> <Networking Address="localhost" Port="22222" /> <ProxyingGateway Address="localhost" Port="30000" /> </Defaults> </OrleansConfiguration>

Startup method

public bool Run() { bool ok = false; try { var _config = new ClusterConfiguration(); _config.StandardLoad(); _siloHost = new SiloHost(System.Net.Dns.GetHostName(), _config); _siloHost.Config.Globals.LivenessType = GlobalConfiguration.LivenessProviderType.Custom; _siloHost.Config.Globals.MembershipTableAssembly = "OrleansConsulUtils"; _siloHost.Config.Globals.ReminderServiceType = GlobalConfiguration.ReminderServiceProviderType.Disabled; _siloHost.InitializeOrleansSilo(); var startedok = _siloHost.StartOrleansSilo(); if (!startedok) throw new SystemException(String.Format("Failed to start Orleans silo '{0}' as a {1} node", _siloHost.Name, _siloHost.Type)); } catch (Exception exc) { _siloHost.ReportStartupError(exc); var msg = string.Format("{0}:\n{1}\n{2}", exc.GetType().FullName, exc.Message, exc.StackTrace); Console.WriteLine(msg); } return ok; }

I've connected to my consul successfully by Consul client SDK whose version is same as orleans used.

Consul log:

D:\Program Files\consul>Consul.exe agent -server -bootstrap -data-dir "C:\Consul
\Data" -client=127.0.0.1 -bind=127.0.0.1 -ui
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Version: 'v0.7.1'
Node name: 'win测试机'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

Test Code

static void Main(string[] args) { using (var _client = new Consul.ConsulClient(config => config.Address = new Uri("http://localhost:8500"))) { var _myService = _client.Catalog.Datacenters().Result; Console.WriteLine(JsonConvert.SerializeObject(_myService.Response)); } Console.ReadLine(); return; }

Test result:

["dc1"]

Got it! The problem is OrleansConsulUtils.dll downloaded from Nuget. I've started silo host successfully after replacing OrleansConsulUtils.dll with the local compiled one. And when I trying to run client with the one got from nuget, error occured:
[2016-12-07 08:13:41.386 GMT 4 ERROR 100319 OutsideRuntimeClient ] !!!!!!!!!! OutsideRuntimeClient constructor failed.
Exc level 0: System.MissingMethodException: Method not found: 'System.Threading.Tasks.Task1<Consul.QueryResult1

Does OrleansConsulUtils limit the Consul.dll version? I use the latest version 0.7.3

From src\NuGet\Microsoft.Orleans.OrleansConsulUtils.nuspec:

dependency id="Consul" version="0.6.4.1"

Not sure if there were breaking changes in the Consul library.

Yes there was a breaking change with this commit and the specific line which fails: https://github.com/PlayFab/consuldotnet/commit/33639dc742190011fbbf1a1d5a12649c6297cdea#diff-1bf84057428740c661334d09509d7b3aL43

All the remote API calls in 0.7.0 of Consul was modified to accept a parameter of CancellationToken hence the MethodNotFoundException.

IMO It is safe to recompile out Consul library with the latest Consul nuget library for 1.4.

For now @vincentshow I think you can solve this by directly reference the newer Consul nuget package and provide a BindingRedirect element to the newer version. It can work, but not 100% sure.

A PR to move OrleansConsulUtils to the latest version of Consul would be great.

Thanks. Little request, the error log written by silohost can be more accurate just like the client one.

@vincentshow would you like to contribute a fix for this issue?

@attilah Sorry for the late answer for you. I'm busy with an online product upgrade of my company using .net core stack in which consul is used too. I can contribute in whatever way next month if you don't mind.

@vincentshow sounds like a plan! It is low prio, but would be nice if you can contribute!

Silo startup has been seriously redone in 2.0.0. This shouldn't be relevant anymore.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

scharada picture scharada  Â·  3Comments

guopenglun picture guopenglun  Â·  3Comments

jdom picture jdom  Â·  3Comments

luciobemquerer picture luciobemquerer  Â·  4Comments

urig picture urig  Â·  4Comments