If you’re planning to use Microsoft Orleans in production, you need to look beyond the lonely silos that we’ve built in the articles thus far. Orleans is designed to work in a cluster, such that a large number of grains can be distributed among multiple silos. In case of silo failure, its grains are reactivated at other silos that are still alive.
In this article, we’ll create a simple silo and run multiple interconnected instances of it in order to set up a cluster.
For the scope of this article, we’ll use the default cluster membership provider: MembershipTableGrain. This is not intended to be used in production, but will allow us to focus on getting a simple cluster up and running. Setting up different cluster membership providers is non-trivial and requires separate articles for each.
Note: this article is based on Orleans 1.4.2 using .NET Framework 4.6.2.
Note: the source code for this article is in the OrleansFirstCluster folder within the Gigi Labs BitBucket repository.
Setting Up An Example
To set up a cluster, the Dev/Test Host project template we’ve been using so far is no longer suitable. Instead, we have to set up the full project structure. This is covered by the latter part of “Getting Started with Microsoft Orleans” and there is no point in repeating it here.
Don’t write any code yet though. I recently learned that all that AppDomain stuff is not necessary unless you’re planning to run Silo and Client in the same application, so we’ll go for a cleaner approach.
We’ll also install the Orleans Dashboard (see: “A Dashboard for Microsoft Orleans“) in the Silo project. This will give us an idea how grains are spread across the cluster later.
Install-Package OrleansDashboard
Hence, when setting up the Silo configuration, remember to include the configuration for the Dashboard:
<?xml version="1.0" encoding="utf-8"?> <OrleansConfiguration xmlns="urn:orleans"> <Globals> <SeedNode Address="localhost" Port="11111" /> <BootstrapProviders> <Provider Type="OrleansDashboard.Dashboard" Name="Dashboard" /> </BootstrapProviders> </Globals> <Defaults> <Networking Address="localhost" Port="11111" /> <ProxyingGateway Address="localhost" Port="30000" /> </Defaults> </OrleansConfiguration>
We can now start adding some code. First, we need a grain in our Grains project. What the grain actually does doesn’t matter. We just want to create a large number of grains to see them spread out over the cluster.
public class UselessGrain : Grain, IUselessGrain { public Task DoNothingAsync() { return Task.CompletedTask; } }
Note: if you’re using a .NET Framework version prior to 4.6, then you’ll need to use TaskDone.Done
instead of Task.CompletedTask
.
The corresponding interface goes in the Interfaces project:
public interface IUselessGrain : IGrainWithIntegerKey { Task DoNothingAsync(); }
Doing away with all the AppDomain junk, the following code should be enough for a simple Silo:
static void Main(string[] args) { Console.Title = "Silo"; try { using (var siloHost = new SiloHost("Silo")) { siloHost.LoadOrleansConfig(); siloHost.InitializeOrleansSilo(); var startedOk = siloHost.StartOrleansSilo(catchExceptions: false); Console.WriteLine("Silo started successfully!"); Console.WriteLine("Press ENTER to exit..."); Console.ReadLine(); siloHost.ShutdownOrleansSilo(); } } catch (Exception ex) { Console.WriteLine(ex); } }
Apart from the AppDomain logic, another thing we’re doing differently from usual here is that we’re calling StartOrleansSilo()
with catchExceptions
set to false
. In case the silo fails to initialise, this gives us the ability to inspect the details of the failure within the exception, rather than have Orleans silently swallow it and simply return false
.
On the client side, we can use an adaptation of the client code from “Getting Started with Microsoft Orleans“:
static void Main(string[] args) { Console.Title = "Client"; var random = new Random(); var config = ClientConfiguration.LoadFromFile("ClientConfiguration.xml"); while (true) { try { GrainClient.Initialize(config); Console.WriteLine("Connected to silo!"); while (true) { var grainId = random.Next(); var grain = GrainClient.GrainFactory.GetGrain<IUselessGrain>(grainId); grain.DoNothingAsync(); } } catch (SiloUnavailableException) { Console.WriteLine("Silo not available! Retrying in 3 seconds."); Thread.Sleep(3000); } } }
In the inner infinite-while
-loop we’re taking random grain IDs and bombarding them with messages. The idea is to create a lot of grain instances that we can visualise. Since this is very heavy, you’ll see Orleans giving warnings, and high latencies from the Dashboard at times.
Running a 3-Node Cluster
We will now run 3 instances of the same silo. Each instance must have different ports configured. This is the configuration for the first silo, which we already set up earlier:
<?xml version="1.0" encoding="utf-8"?> <OrleansConfiguration xmlns="urn:orleans"> <Globals> <SeedNode Address="localhost" Port="11111" /> <BootstrapProviders> <Provider Type="OrleansDashboard.Dashboard" Name="Dashboard" /> </BootstrapProviders> </Globals> <Defaults> <Networking Address="localhost" Port="11111" /> <ProxyingGateway Address="localhost" Port="30000" /> </Defaults> </OrleansConfiguration>
A silo needs 2 ports: one to communicate with other silos (the Networking
endpoint) and one for clients to connect to it (the ProxyingGateway
endpoint). The client can connect to any node on the cluster.
In production scenarios, Orleans silos are all equal, and there is no concept of a primary and secondary silo. However, when you use the default MembershipTableGrain cluster membership, then all information regarding the silos on the cluster is stored within a grain in one of the silos. As a result, the silo containing the MembershipTableGrain is denoted as the Primary Silo. It must be started before the others, and the entire cluster is messed up if it goes down. Naturally, this is not good, and you should look into other cluster membership providers.
In such a setup, the SeedNode
configuration specified in all silos must be the endpoint of the Primary silo. Let’s see what the configuration for our second silo instance looks like:
<?xml version="1.0" encoding="utf-8"?> <OrleansConfiguration xmlns="urn:orleans"> <Globals> <SeedNode Address="localhost" Port="11111" /> <BootstrapProviders> <Provider Type="OrleansDashboard.Dashboard" Name="Dashboard" Port="8081" /> </BootstrapProviders> </Globals> <Defaults> <Networking Address="localhost" Port="11112" /> <ProxyingGateway Address="localhost" Port="30001" /> </Defaults> </OrleansConfiguration>
Aside from changing the Networking
and ProxyingGateway
ports, we are also using a different port for the Dashboard (default is 8080). Each silo has its own Dashboard (although they all show the same information), and they cannot all run from the same port.
Similarly, the configuration for our third silo instance is just a matter of changing ports:
<?xml version="1.0" encoding="utf-8"?> <OrleansConfiguration xmlns="urn:orleans"> <Globals> <SeedNode Address="localhost" Port="11111" /> <BootstrapProviders> <Provider Type="OrleansDashboard.Dashboard" Name="Dashboard" Port="8082" /> </BootstrapProviders> </Globals> <Defaults> <Networking Address="localhost" Port="11113" /> <ProxyingGateway Address="localhost" Port="30002" /> </Defaults> </OrleansConfiguration>
We can then start the 3 silo instances and the client:
On my system, the load is just too much and Orleans just dies after around 64k activations. So let’s add a little delay in the random message loop to give Orleans some room to breathe:
while (true) { var grainId = random.Next(); var grain = GrainClient.GrainFactory.GetGrain<IUselessGrain>(grainId); Thread.Sleep(50); grain.DoNothingAsync(); }
After running it again, what I see is that grains are allocated mainly to the primary silo initially, but they are distributed more evenly across the other silos after around 1,000 activations:
I am not sure why they are not evenly distributed from the start. My guess is that either it is more efficient to have them all in one place if the number of activations is small, or the silos need time to coordinate between themselves before this happens (which would explain why, without a delay, all activations are allocated on the primary node).
Single Point of Failure
Close the primary silo.
Since the Primary silo contains the MembershipTableGrain, all information about the cluster dies with it. The remaining silos and clients will not recover automatically even if the Primary silo is brought up again. They in turn will have to be restarted. This is because, as we saw earlier, Secondary silos must start after the primary one. When the Primary silo is brought back, it effectively starts a fresh cluster and does not know about any other silos until they join.
Conclusion
We have seen how to get a very basic Orleans cluster working with multiple silos sharing the burden of holding the grains. However, this is hardly an ideal setup because (a) cluster membership information is held in memory and represents a single point of failure, and (b) the fact that I ran all silos on the same machine made them subject to the same physical resource constraints as if I were running a standalone silo.
For better results, run different silos on different machines, and use a decent cluster membership provider. Orleans supports the following:
- MembershipTableGrain (not realiable, use for testing only)
- SQL Server
- Azure Table Storage
- Apache ZooKeeper
- Consul
- DynamoDB
Update 20th June 2017: I am told that Azure Service Fabric should also be supported. As for database implementations of cluster membership, these are not limited to just SQL Server. You may use any supported ADO .NET provider, which at the moment includes SQL Server, MySQL/MariaDB, or PostgreSQL. To clarify: while the PostgreSQL storage provider for grain persistence is not yet available, its use as a cluster membership provider is supported.
Great article. I am trying to do the same with Orleans 2.0. How do I do that ?
I haven’t gone that far with 2.0 yet. 🙂 You’d best ask on the Gitter chat for the time being. They’ll be more than happy to help you out!
OK will do. Thanks !!!