Interview: Virtual Actors with Microsoft Orleans

Introduction

In recent years, the ever increasing demand for computing resources has rendered traditional single-threaded programming inadequate for most modern applications. Faced by heavy performance and scalability challenges, many developers are forced to turn to concurrent and distributed programming.

While multithreaded programming has been in use for many years, those who have used it will know that building a performant shared memory system free of race conditions can be very challenging to get right.

It is possible to avoid the complications of shared memory systems, and indeed multithreading, by using a message passing system. An actor model is a framework where processing is done by a large number of single-threaded actors, which communicate together by sending asynchronous messages.

As it turns out, Microsoft have their own actor model, and it’s called Orleans. Sergey Bykov (SB), Principal Software Development Engineer Lead at Microsoft, and project lead of the Orleans project, has very kindly agreed to answer my (DD) questions about Orleans.

Orleans Overview

DD: What is Microsoft Orleans?

SB: The home page of our docs says the following.

“Orleans is a framework that provides a straightforward approach to building distributed high-scale computing applications, without the need to learn and apply complex concurrency or other scaling patterns. It was created by Microsoft Research and designed for use in the cloud.

“Orleans has been used extensively in Microsoft Azure by several Microsoft product groups, most notably by 343 Industries as a platform for all of Halo 4 and Halo 5 cloud services, as well as by a growing number of other companies.”

SB: In other words, Orleans provides a programming model (backed by the Orleans runtime) for building distributed scalable applications almost as easily as single machine apps. The goal of the project from the beginning was to democratize cloud development by making a broad range of developers with little to no distributed systems expertise productive and successful in building scalable distributed systems in general, and cloud services in particular.

The introduction explains that Orleans is built around a distributed actor model, and the key innovation there is the notion of Virtual Actors. Detailed description is in our publication.

DD: Out of curiosity, why the name, ‘Orleans’?

SB: It was a general rule within Microsoft that codenames should be chosen from geographical names like names of cities because those aren’t trademark-able. Over time, the codename of Orleans accrued enough brand recognition that we decided to stick with it when we went open source.

DD: Tell us a little about the history of Microsoft Orleans.

SB: Orleans started in 2009 as a research project within a new Microsoft Research lab that eventually was named eXtreme Computing Group (XCG, it was later merged with MSR’s Redmond lab). The goal for the project was to try to create something that would qualitatively simplify creating software for the cloud. The two major challenges we focused on were 1) the complexity of building distributed systems that has traditionally been the domain of a relatively small population of expert developers; and 2) the pattern of major re-architectures required from nearly every successful web property as they experienced exponential growth of their user base.

We took on building a framework with a programming mode that would make mainstream single-machine developers productive in the cloud and would help build systems and services that could easily scale through several orders of magnitude of growing scale. While focusing on mainstream developers, we wanted Orleans to be as appealing to expert developers, by reducing the amount of low level ceremony they have to deal with. As we went through several early prototypes and iterations, we learned quite a bit from building first Orleans applications, and even more so when we started collaborating with internal product groups. The programming model has evolved, and we arrived to what we ended up naming the “Virtual Actor Model”.

Using Orleans

DD: How does Microsoft Orleans compare with other actor models?

SB: The Actor Model is quite old, and there are many various implementations of it. There’s a much smaller number of available Distributed Actor Model solutions. The most popular ones are Erlang/OTP and its JVM “younger sibling” Akka. Erlang and Akka organically grew from being single process actor libraries into the multi-machine scenarios by gradually adding remoting and distribution features. They brought the fault tolerance model of hierarchical supervision trees that are easy within a single process, acceptable for small-scale fixed topologies, but are difficult to manage at cloud scale, especially for developers with limited distributed systems experience.

The Virtual Actor Model of Orleans removed a lot of coordination and fault tolerance complexity from developers’ shoulders by providing an intuitive notion of actors that don’t need to be created, destroyed or looked up. The “Virtual” qualifier comes from the analogy with virtual memory. Actors in Orleans live “eternal” life, always available for a call to process, and the Orleans runtime is responsible for instantiating their physical “incarnations” in memory on an as needed basis, and for removing idle ones to free up resources. The Orleans runtime also transparently handles failures of servers by keeping track of instantiated actors and recreating them when needed on a different server in case of a failure. As a result, the developer writes much less code (we’ve received anecdotal reports of 3-5 times reduction of code, up to 10 times in some cases) and much simpler code, free from data races and complex distributed coordination logic.

The effort of Orleans to ‘democratize’ distributed programming and to raise developer productivity received an endorsement of sort from the inventor of Actor Model, Carl Hewitt. In his recent publication Actor Model of Computation for Scalable Robust Information Systems he wrote that: “Orleans is an important step in furthering a goal of the Actor Model that application programmers need not be so concerned with low-level system details.” Obviously, that made the Orleans team very proud.

DD: In Microsoft Orleans, virtual actors are also known as grains. They run within host processes called silos. Why were these names devised?

SB: Early on we had the intuition that we’d end up with a novel programming model. In hindsight, that was prescient. The “grains” term is distinct from the already overloaded term actor, where it’s hard to tell upfront if somebody is talking about single machine concurrency or about a distributed case. In the end, “grain” is a shorthand for “Orleans actor” or “virtual actor”. When we needed to name the runtime containers for grains, we naturally went down the agricultural path with “silos”. Just imagine the confusion if called them “containers”.

DD: Who is using Microsoft Orleans, and how well does it support their systems’ scalability?

SB: Orleans has been used in production inside Microsoft since 2011. It is enjoing a growing adoption outside Microsoft after we publicly released a binary preview, and then open-sourced it. We see a wide range of systems built with Orleans: online gaming, finance, collaboration solutions, fraud detection, social network analysis, and more. One of the hottest areas is IoT. There we see Orleans-based systems that manage devices like thermostats and even, I’m not joking, mousetraps. One of the fascinating projects is the green power storage facility in Hawaii. We showed some scalability numbers in our paper.

DD: Is Microsoft Orleans meant only to be used in the cloud?

SB: The advent of the cloud brought the challenges of building reliable scalable distributed systems into the spotlight. Orleans as a project focused on solving those fundamental challenges. As a result, Orleans is equally applicable in any cloud and on premises. We have customers running Orleans in AWS and some interested in GCP, but also those that use it in private datacenters and on corporate IT infrastructures. Our first target was naturally Azure, and we built providers and extensions for it first. But Orleans was designed with extensibility in mind, and it is fairly easy to make it run pretty much anywhere.

Development and Support

DD: What is the Microsoft Orleans team currently working on, and is there a roadmap for future development?

SB: Our current focus is on making Orleans run on .NET Core, support for geo-distribution, improvements to streaming, application lifecycle and the upgrade and versioning process. Even though the project moved out of Microsoft Research to the product group, we have an ongoing collaboration with Research, which gives us a healthy pipeline of new ideas and advanced prototypes. Support for geo-disribution is one example. We also have support for indexing of actors, ACID multi-actor transactions, and reactive computations at various stages of readiness. Orleans is one of the most popular Microsoft open source projects, right next to .NET Core and Roslyn. We continue to work on it and recently substantially increased our investments.

DD: What resources are available for developers building their systems upon Microsoft Orleans?

SB: We keep hearing that our documentation is very good compared to other open source projects, but we keep improving it (and samples) as people point to topics that aren’t clear or can be explained better. The community around the project is our biggest “brain trust” and the best source of support for new people. It’s an amazing group of experienced and passionate engineers around the globe that come to our GitHub repo and Gitter chat not only because they use Orleans for their projects and contribute to it, but also because they enjoy hanging out with this very welcoming and encouraging community that always tries to help, even with topics not directly related to Orleans.