Orleans: Expected GrainFactory behavior in a heterogeneous silos environment?

Created on 6 Jun 2018  路  7Comments  路  Source: dotnet/orleans

Hi,

I've set up heterogeneous silos and verified that a client indeed doesn't need to know where grains are hosted (and that's fine).

However, the scenario of agnostic grain invocation fails with the following setup:

  • The environment is a cluster (I've used Consul as a membership provider)
  • First silo has one grain of type 'A' (in my case it's a simple 'Hello' grain)
  • Second silo has one grain of type 'B' (a simple 'AnotherHello' grain)
  • Both interfaces and implementations are isolated (one assembly for each artifact - 4 projects)

The first grain 'A' has one responsibility, to find and invoke grain 'B'. Grain 'B' logs a simple message ('AnotherHello').

The client begin the call chain by invoking 'A'.

When control hits 'A', it tries to resolve 'B' by using GrainFactory, however the GrainFactory throws an exception, saying that the implementation for 'B' is unknown.

I suspect that GrainFactory makes a look-up inside the hosting silo, but not beyond (e.g. in the cluster), because if the silo host both 'A' and 'B', then GrainFactory finds the reference and everything works fine.

Is this correct interpretation of GrainFactory, or is this unexpected behavior? If this is the expected behavior of GrainFactory, then grains have to know what grains are hosted in the silo (agnostic grain invocation won't work) to avoid generating the exception.

Kind regards,
Jan

P3 documentation question

Most helpful comment

I did and tried, and it worked fine! It's good that Jill may take a look at this.

Thanks

All 7 comments

I suspect that GrainFactory makes a look-up inside the hosting silo, but not beyond (e.g. in the cluster)

It looks up in the merged type map of all silos. However, it takes a little bit of time for the type maps of individual silos to propagate to all silos in the cluster. I wonder it timing is the issue here.

Going to run the same scenario, but wait some delay before invoke (e.g. 30 seconds between each invocation, for say 10 rounds) and see if the maps have propagated after awhile.

Hi again,

Thanks for pointing me in the right direction. It is a timing issue. The call chain works without fail after approximately 1 minute (slightly longer). I guess by then that the type maps have been propagated throughout the cluster.

This was a good learning point for me.

BR,
Jan

I believe it should take much less than a minute. @benjaminpetit, what do you think?

@JillHeaden We need to document the behavior to set the right expectations here.

@jan-johansson-mr you can change the default refresh delay

you can change the interval that silos and clients check for changes in types supported with the property TypeMapRefreshInterval from TypeManagementOptions

I did and tried, and it worked fine! It's good that Jill may take a look at this.

Thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jdom picture jdom  路  3Comments

galvesribeiro picture galvesribeiro  路  4Comments

scharada picture scharada  路  3Comments

bwanner picture bwanner  路  5Comments

Vlad-Stryapko picture Vlad-Stryapko  路  3Comments