The King who Forsook his own Virtues

Bias

As I write this, I can’t help but be conscious about bias. I’ve been a fan of the Ultima series of games since childhood. In 2001, I joined the Ultima Dragons Internet Chapter (UDIC) – an online fanclub dedicated to the series – and I’ve been a part of this community longer than I haven’t. In July 2002, I launched my first website, Dino’s Ultima Page, which was a leading site in the Ultima community for about a decade, and it will turn 16 years old in less than two weeks from now.

Left to right: Dr. Cat, Starr Long, Denis Loubet facepalming, and Richard Garriott at the Ultima Dragons Internet Chapter 25th Anniversary Bash

Last year, that same UDIC fanclub turned 25 years old, and a big party took place in Disneyland, Anaheim. I travelled all the way to California to be part of it, and like the rest of the people there, I was thrilled that several of the people who worked on the game – essentially our childhood heroes – were present to hang out with their fans.

The Kickstarter

There was similar enthusiasm a few years before that party, in March 2013, when Richard Garriott’s latest company, Portalarium, set up a Kickstarter campaign to fund a spiritual successor of Ultima called Shroud of the Avatar: Forsaken Virtues. The fans, starved for years of the creativity and entertainment by Electronic Arts (who currently owns the rights to the Ultima intellectual property), and sick of the failures it produced in an attempt to make money off its existing fanbase, readily poured their coin into a new game that would be made by some of the same people behind Ultima. The Kickstarter alone raised $1.9m, with additional funding secured after that.

Starr Long and Richard Garriott, speaking at the Ultima Dragons Internet Chapter 25th Anniversary Bash

Faced with this exciting prospect, what do you think a long-standing fan such as myself did?

I simply ignored it.

One reason was that I seldom had time to play games any more. But more importantly, it felt like madness to put money into a game even before it had started development, no matter who was involved. Coming from a country where customer service is abysmal, the last thing I’m going to do is give people my money to do whatever they want with it, without even being able to check some reviews first.

The trainwreck

In hindsight, I’m glad I did that. A recent lengthy review by taxalot at RPG Codex (with additional post-mortem insight by the author in the article’s comments) exposes the game as unfinished, buggy, and all round underwhelming in just about every aspect.

Most notable is that Portalarium tried to appeal to both the existing Ultima fanbase by promising a single player experience, while also going the MMORPG direction for those who wanted that.

“And sold they did. The first consequence of this was that if you backed the game for the single player experience… well, you probably gave up hope the moment your bank account was debited. To someone who was looking for a great single player adventure, the monthly emails focused solely on player housing were utterly depressing, an obvious sign that Portalarium had taken your money and were doing whatever the hell they wanted with it. Month after month, the studio unveiled new kinds of houses that you could buy with real money. But why stop at a house? Why not buy a castle? Or a whole town? You could do that too, as a solo player or as a guild to have your own place to regroup. The emphasis on this aspect of the game was truly puzzling. Between that and the monthly dance parties thrown by “DJ Darkstarr” (executive producer Starr Long’s alter ego), one might wonder whether the point was to have exciting adventures or just to create some sort of virtual renaissance fair for everyone to LARP in. In many ways, it felt like Portalarium were increasingly less interested in selling a game than a medieval Second Life service.” — RPG Codex Review: Shroud of the Avatar

Even more maddening is the concept of buying virtual houses with real money, and have to pay regular taxes on them. As if real-life housing weren’t bad enough – all we needed was to have the same problems in our games.

As you can imagine, this enraged several fans who backed the game based on the promise of Richard Garriott going back to his roots. One of these, who pledged $1500 for the game, was permanently banned from Shroud of the Avatar forums for questioning the direction of the project in this regard. He recently published the comments he was banned for, along with all the email correspondence that ensued, exposing what seems to be blatant abuse of power and excessive censorship.

The Future of Portalarium

While this whole mess is still unfolding, Portalarium laid off half their team just a few weeks ago, mainly laying off people in their art and design department. Which is ironic, because seeing that review on RPG Codex, it appears that these are the areas where help is most needed.

Meanwhile, in reaction to same review, Ultima Dragons have been discussing whether the resulting game is the fault of incompetent developers or incompetent management. While this is difficult to ascertain without having inside information, one may take a hint from the single Glassdoor review about the company (to be sure, a single review isn’t a very good sample, but it gives an idea):

The email correspondence about the aforementioned banning incident also rings alarm bells.

“It can also be hard to be confronted with your own misbehavior. In fact it can be so hard that many people, like yourself, cannot even face it and instead choose to focus on everything but your own actions.” — Starr Long, email correspondence

Given that this whole incident was a result of trying to stifle criticism, let’s just say I wouldn’t have been too happy to get this kind of response myself, especially from an Executive Producer.

History Repeats Itself

Shroud of the Avatar: Forsaken Virtues was fully released in March 2018 (even if in the pitiful state that the aforementioned review shows). That means it’s taken five years of development, and a whole lot of money. If you’ve been following the history of Ultima, you’ll find that it’s strangely reminiscent of Ultima 9, the last Ultima game that was released in 1999. Ultima fans generally consider the game to be a disaster, and often blame EA for the turnout.

Another thing EA is blamed for is the general fate of the Ultima intellectual property. After Ultima 9, there was pretty much no activity whatsoever for years. In more recent years, EA decided to reuse the Ultima intellectual property, resulting in a series of failures that were cancelled either even before being launched, or afterwards.

Ultima fans, for instance, generally agree that Lords of Ultima had nothing to do with Ultima other than the name. Ultima Forever: Quest for the Avatar similarly has a few names that fans will remember (including “Lady British”), but little else that feels familiar in terms of story or gameplay. This practice is called name-dropping, and guess what other game does this? That’s right. Shroud of the Avatar: Forsaken Virtues.

One would think that veteran game developers would learn from past blunders (theirs or otherwise), but after all this, the advice to management from that earlier Glassdoor review seems to hit the nail on the head.

Forsaken Virtues Indeed

Ultima 4 received critical acclaim because it brought ethics into an RPG genre that was principally dominated by “kill the bad villain” storylines. The virtues, conceived by Richard Garriott, would be central to all the mainstream Ultima games after that, except for a couple set on different words. Ultima 5, for instance, showed what happens when virtues are taken to the extreme.

“Thou shalt not lie, or thou shalt lose thy tongue.” — Ultima 5

If Shroud of the Avatar got nothing right, it has a great name. Forsaken Virtues very much reflects its overall direction. Honesty, for instance, was thrown out the window along with the Kickstarter promises. Compassion is shot down once you read the aforementioned email correspondence. Sacrifice is done by Portalarium only insofar as other people’s money and their own staff are involved.

As for humility, there are multiple aspects to this. One is that the game tried to be everything (scope creep anyone?), and thus failed to be stand out (or even be decent) in any one department. Another is that the top people behind the game need to get off their pedestal and start listening to their fans.

AWS Lambda .NET Core 2.1 Support Released

Amazon Web Services (AWS) has just announced that its serverless function offering, AWS Lambda, now supports the .NET Core 2.1 runtime, which was released towards the end of May 2018.

Quoting the official announcement:

“Today we released support for the new .NET Core 2.1.0 runtime in AWS Lambda. You can now take advantage of this version’s more performant HTTP client. This is particularly important when integrating with other AWS services from your AWS Lambda function. You can also start using highly anticipated new language features such as Span<T> and Memory<T>.

“We encourage you to update your .NET Core 2.0 AWS Lambda functions to use .NET Core 2.1 as soon as possible. Microsoft is expected to provide long-term support (LTS) for .NET Core 2.1 starting later this summer, and will continue that support for three years. Microsoft will end its support for .NET Core 2.0 at the beginning of October, 2018[2]. At that time, .NET Core 2.0 AWS Lambda functions will be subject to deprecation per the AWS Lambda Runtime Support Policy. After three months, you will no longer be able to create AWS Lambda functions using .NET Core 2.0, although you will be able to update existing functions. After six months, update functionality will also be disabled.

“[1] See Microsoft Support for .NET Core for the latest details on Microsoft’s .NET Core support.
“[2] See this blog post from Microsoft about .NET Core 2.0’s end of life.”

The choice here seems obvious: upgrade and get faster HttpClient, new language features, and long-term support; or lose support for your functions targeting .NET Core 2.0 (whatever that actually means).

In order to migrate to .NET Core 2.1, you’ll need the latest tooling – either version 1.14.4.0 of the AWS Toolkit for Visual Studio, or version 2.2.0 of the Amazon.Lambda.Tools NuGet package.

Check out the official announcement at the AWS blog for more information, including additional tips on upgrading.

Orleans 2.0 Stateless Worker Grains

In this article, we’ll see how to create grains that automatically scale up and down depending on load, in Microsoft Orleans 2.0.

The source code for this article is very similar to that in “Getting Started with Microsoft Orleans 2.0 in .NET Core“, with a few key differences:

  • It has been modified to gracefully stop the silo and gracefully close the client.
  • It uses the latest packages at the time of writing this article – Orleans 2.0.3 and OrleansDashboard 2.0.7.
  • It uses a slightly different example, and the load generation has been adapted accordingly.

Since there’s nothing really new in the client and silo setup, we’ll be focusing mainly on the grain and load generation parts. However, you may find the full source code for this article in the Orleans2StatelessWorkers folder in the Gigi Labs BitBucket repository.

Example Grain

For the sake of example, we’ll imagine that the job of our Orleans cluster is to provide hashing as a service. A client provides an input string, and we’ll have a grain that computes a hash of the string (it doesn’t really matter what hash function it is – we’ll use MD5 in the example) and returns it.

Based on this requirement, we can easily write a grain and its corresponding interface to perform the hash calculation:

    public interface IHashGeneratorGrain : IGrainWithIntegerKey
    {
        Task<string> GenerateHashAsync(string input);
    }

    public class HashGeneratorGrain : Grain, IHashGeneratorGrain
    {
        private HashAlgorithm hashAlgorithm;

        public HashGeneratorGrain()
        {
            this.hashAlgorithm = MD5.Create();
        }

        public Task<string> GenerateHashAsync(string input)
        {
            var inputBytes = Encoding.UTF8.GetBytes(input);
            var hashBytes = hashAlgorithm.ComputeHash(inputBytes);
            var hashBase64Str = Convert.ToBase64String(hashBytes);

            return Task.FromResult(hashBase64Str);
        }
    }

Load Generation

Typically, when we talk about actor models, the whole point is to have an instance of an actor (grain in Orleans) per entity ID. For instance, you’d have a grain instance for each Device, Vehicle, BlogPost, Game, User, or whatever other domain object you’re dealing with. In this case, however, our grain is completely stateless, and there is no difference in behaviour between one activation and another. In fact, since the grain ID doesn’t matter, we can just pass in 0 as a sort of convention when requesting a grain of this kind:

var hashGenerator = client.GetGrain<IHashGeneratorGrain>(0);

Once we have an instance of the grain, we can generate some load by creating random strings and invoking the relevant method on the grain repeatedly:

            while (true)
            {
                var randomString = GenerateRandomString();
                var hash = await hashGenerator.GenerateHashAsync(randomString);
                Console.WriteLine(hash);
            }

You can monitor the grain’s activity from the Orleans Dashboard (localhost:8080 by default), and as you’d expect, there is only one activation of the grain:

Stateless Worker Grains

This situation is a very good fit for Stateless Worker Grains.

Normally, when you request a grain with a particular ID, you get a single activation – and it is a singleton throughout the cluster, so you would never (bar edge cases involving failover scenarios) get more than one instance of that grain in the cluster. However, if you just add a [StatelessWorker] attribute on the grain…

    [StatelessWorker]
    public class HashGeneratorGrain : Grain, IHashGeneratorGrain

…you’ll see very different behaviour:

Notice how there are now two activations of the HashGeneratorGrain, even though we’re still requesting an instance with ID 0.

When Orleans sees the [StatelessWorker] attribute, it will create a pool of grains behind the ID you specify. This is similar to a load balancer. Those grains are hidden behind that same ID, so you can’t access individual grains in the pool directly (it wouldn’t make any sense to do that). The number of grains will grow up to as many CPU cores are available on the machine, unless you pass an argument to the attribute specifying otherwise.

Aside from autoscaling, another important benefit of stateless worker grains is that they are always local. Orleans will always execute a request to a stateless worker on the same silo where the request was generated, spawning a new activation if necessary. This saves the overhead of potentially passing the request to an instance in a different silo (i.e. remote call), which makes a lot of sense for stateless workers that are pure logic and there’s no difference between activations running in different places.

Although stateless worker grains are best used for stateless logic (as one would expect), there is nothing preventing their use with state. However, coordination of state between multiple grain activations with the same ID can be complicated. The Stateless Worker Grains documentation describes some patterns where stateless worker grains with state make sense (although calling them that way is bizarre).

Summary

  • Use the [StatelessWorker] attribute to treat a grain as a stateless worker grain.
  • This creates a load-balanced autoscaling pool of grains with the same ID.
  • Requests to stateless worker grains are always local and never incur a remote call.
  • Stateless worker grains may have state, although this is unusual.

Accessing an ASP .NET Core Web Application Remotely

After setting up an empty ASP .NET Core Web Application, it’s easy to quickly run it and see something working, in the form of the usual “Hello World”:

When trying to deploy this somewhere though, you might be disappointed to notice that you can’t access the web application from another machine:

In fact, you’ll notice that you can’t even access it from the same machine if you use the actual hostname rather than localhost.

This is because by default, Kestrel will listen only on localhost. In order for another machine to access the web application using the server’s hostname, the web application must specify the endpoints on which Kestrel will listen to, using code or command-line arguments.

Note: you may also need to open a port in your firewall.

In code, this can be done by invoking UseUrls() in the webhost builder as follows:

        public static IWebHost BuildWebHost(string[] args) =>
            WebHost.CreateDefaultBuilder(args)
                .UseStartup<Startup>()
                .UseUrls("http://myhostname:54691")
                .Build();

Replace “myhostname” with the hostname of the server, and note that the localhost endpoint will still work even though it’s not specified explicitly here.

If you want to pass the the endpoint(s) via command line parameters instead, you can do so via the --urls argument. First, you need to change the BuildWebHost() method generated by the project template as per this GitHub comment, to allow command line parameters to be passed to the WebHostBuilder via configuration:

public static IWebHost BuildWebHost(string[] args)
{
    var configuration = new ConfigurationBuilder().AddCommandLine(args).Build();

    return WebHost.CreateDefaultBuilder(args)
        .UseConfiguration(configuration)
        .UseStartup<Startup>()
        .Build();
}

Then, use the --urls argument when invoking dotnet run:

dotnet run --urls http://banshee:54691/

Either of these methods is fine to allow remote machines to access your ASP .NET Core web application.

.NET Core 3 to Support Desktop Applications… Kind of

A few days ago, Microsoft published a blog post titled “.NET Core 3 and Support for Windows Desktop Applications“. Just by reading the title, I’m pretty sure many of us jumped in their seats as thoughts like “WPF on Linux” became a source of excitement.

Image source: .NET Core 3 and Support for Windows Desktop Applications

Unfortunately however, the excitement turns into a disappointed “Oh. [Awkward silence] OK.” when reading that although .NET Core 3 is planned to support desktop applications built on technologies like Windows Forms and WPF, this support is for Windows only:

“Support for Windows desktop will be added as a set of “Windows Desktop Packs”, which will only work on Windows. .NET Core isn’t changing architecturally with this new version. We’ll continue to offer a great cross-platform product, focused on the cloud. We have lots of improvements planned for those scenarios that we’ll share later.”

The article does mention that this will bring several benefits ranging from performance improvements to deployment options, but this pales in comparison to the prospect of going cross-platform.

But given that they “are planning on releasing a first preview of .NET Core 3 later this year and the final version in 2019”, I can only wonder why they would spend a year doing a huge amount of work that most people won’t even care about, and pass it as a major release of .NET Core.

Time will tell, but one can get an idea of how people feel about this from the comments in the blog post.

My guess is that this could be part of a long-term strategy to retire the full .NET Framework, rather than bringing any real value to .NET Core or desktop applications.

Orleans 2.0 Dependency Injection

Dependency Injection (DI) has become a cornerstone of any well-designed and testable application nowadays, and Microsoft Orleans applications are no exception. In the 2.0 release, Microsoft Orleans has replaced some of its old internal frameworks (such as logging and dependency injection) with the corresponding Microsoft packages; thus these will be familiar for those who already worked with ASP .NET Core.

In this article we’ll focus on setting up dependency injection in the silo so that we can pass dependencies into our grains. However, if you read the dependency injection documentation page for Orleans 2.0, you’ll see that you can also have DI on the client side.

The source code for this article is the Orleans2DependencyInjection folder at the Gigi Labs BitBucket repository. Be careful not to confuse it with the OrleansDependencyInjection folder which targets Orleans 1.4.x.

TL;DR if you just want to quickly see how to do DI without going through the whole example, jump to the Registering Dependencies section.

Update 30th June 2018: The source code for this article needs a little adjusting, in order to gracefully stop the silo and gracefully close the client. Counterintuitively, directly disposing a silo or client is non-graceful and is generally discouraged.

Grain

We’ll start off with a project structure based on the Getting Organised article. Once that is in place, we can build an example representing a blog’s comment system. In the Grains project, we’ll add a grain representing a blog post, and that will be responsible for saving and retrieving all comments for that blog post.

    public class BlogPostGrain : Grain, IBlogPostGrain
    {
        private ICommentRepository repo;
        private ITimeService time;

        public BlogPostGrain(ICommentRepository repo, ITimeService time)
        {
            this.repo = repo;
            this.time = time;
        }

        public Task SaveCommentAsync(int blogPostId, InputComment comment)
        {
            var storedComment = new StoredComment()
            {
                Name = comment.Name,
                EmailAddress = comment.EmailAddress,
                Body = comment.Body,
                Timestamp = this.time.UtcNow
            };

            return this.repo.SaveCommentAsync(blogPostId, storedComment);
        }

        public Task<List<StoredComment>> GetCommentsAsync(int blogPostId)
            => this.repo.GetCommentsAsync(blogPostId);
    }

There are a few classes and interfaces in here that we haven’t created yet, but let’s understand what we’re doing here. We have a dependency on a repository where the comments will be held (whatever that is – we don’t care about the implementation at this stage). The grain acts mostly as a pass-through to this repository for storage and retrieval of comments, but when saving, we transform it by adding a timestamp. We use different DTOs for input comments and stored comments so that it is not possible to supply a timestamp with the input data.

We also have a second dependency on something called a time service. While you could just use DateTime.UtcNow in your code, time is typically one of the dependencies you want to factor out of your unit tests because it can affect the results. So we wrap DateTime.UtcNow in something we can mock, just for the sake of unit tests later.

Contracts

In the Contracts project, we’ll add all our interfaces and DTOs. Let’s start with our dependencies:

    public interface ITimeService
    {
        DateTime UtcNow { get; }
    }

    public interface ICommentRepository
    {
        Task SaveCommentAsync(int blogPostId, StoredComment comment);
        Task<List<StoredComment>> GetCommentsAsync(int blogPostId);
    }

Then we have our grain interface:

    public interface IBlogPostGrain : IGrainWithIntegerKey
    {
        Task SaveCommentAsync(int blogPostId, InputComment comment);
        Task<List<StoredComment>> GetCommentsAsync(int blogPostId);
    }

And finally our DTOs:

    public class InputComment
    {
        public string Name { get; set; }
        public string EmailAddress { get; set; }
        public string Body { get; set; }
    }

    public class StoredComment : InputComment
    {
        public DateTime Timestamp { get; set; }
    }

Dependency Implementations

In the Silo, we can create implementations for our dependencies.

To keep it simple, we’ll implement our repository using a ConcurrentDictionary. This is a volatile, in-memory implementation that is for demonstration only, but it allows us to focus on what we’re doing with Orleans, rather than distracting us with store-specific details.

Note: We could also use Orleans storage providers, but that’s out of scope here.

    public class MemoryCommentRepository : ICommentRepository
    {
        private ConcurrentDictionary<int, List<StoredComment>> dict;

        public MemoryCommentRepository()
        {
            this.dict = new ConcurrentDictionary<int, List<StoredComment>>();
        }

        public Task<List<StoredComment>> GetCommentsAsync(int blogPostId)
        {
            this.dict.TryGetValue(blogPostId, out var comments);
            return Task.FromResult(comments);
        }

        public Task SaveCommentAsync(int blogPostId, StoredComment comment)
        {
            this.dict.AddOrUpdate(blogPostId,
                addValue: new List<StoredComment>() { comment },
                updateValueFactory: (postId, commentsList) => {
                    commentsList.Add(comment);
                    return commentsList;
                });

            return Task.CompletedTask;
        }
    }

The time service is really simple: it just wraps DateTime.UtcNow.

    public class TimeService : ITimeService
    {
        public DateTime UtcNow => DateTime.UtcNow;
    }

Registering Dependencies

All the above was setting up the example, and now we get to the part we’ve all been waiting for.

We’ll set up our silo’s code similarly to what we’ve done in the past two articles, but this time, we’ll add a call to ConfigureServices() in order to register our dependencies:

            var siloBuilder = new SiloHostBuilder()
                .UseLocalhostClustering()
                .UseDashboard(options => { })
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2DependencyInjection";
                })
                .Configure<EndpointOptions>(options =>
                    options.AdvertisedIPAddress = IPAddress.Loopback)
                .ConfigureServices(services =>
                {
                    services.AddSingleton<ITimeService, TimeService>();
                    services.AddSingleton<ICommentRepository, MemoryCommentRepository>();
                })
                .ConfigureLogging(logging => logging.AddConsole());

Note: as per the previous articles, C# 7.1 or above is needed in order to allow async/await in Main().

Since AddSingleton() is an extension method coming from Mirosoft.Extensions.DependencyInjection (already included as a dependency of Microsoft.Orleans.Core), you’ll need to add the following for this to work:

using Microsoft.Extensions.DependencyInjection;

The API

We can complete this example by exposing the grain’s functionality via our Web API. For this, we’ll add the following controller:

    [Produces("application/json")]
    [Route("api/BlogPosts")]
    public class BlogPostsController : Controller
    {
        private IClusterClient orleansClient;

        public BlogPostsController(IClusterClient orleansClient)
        {
            this.orleansClient = orleansClient;
        }

        [HttpGet]
        public Task<List<StoredComment>> Get(int blogPostId)
        {
            var grain = this.orleansClient.GetGrain<IBlogPostGrain>(blogPostId);
            return grain.GetCommentsAsync(blogPostId);
        }

        [HttpPut]
        public async Task Put(int blogPostId, InputComment comment)
        {
            var grain = this.orleansClient.GetGrain<IBlogPostGrain>(blogPostId);
            await grain.SaveCommentAsync(blogPostId, comment);
        }
    }

 

Note: as I write this, I am noticing a quirk in this implementation. If you get a grain with a blogPostId, then why do you have to pass it again to call the method on the grain? The grain should know its ID already. Fair enough – that was an oversight on my part. But since grain IDs are retrieved using extension methods, and thus their retrieval would also need to be mocked, I’d rather not overcomplicate things in this example.

We can then add Swagger to the Web API and wire up the Orleans client as we did in the Getting Organised article (complete with retries):

       private IClusterClient CreateOrleansClient()
        {
            var clientBuilder = new ClientBuilder()
                .UseLocalhostClustering()
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2DependencyInjection";
                })
                .ConfigureLogging(logging => logging.AddConsole());

            var client = clientBuilder.Build();

            client.Connect(async ex =>
            {
                Console.WriteLine("Retrying...");
                await Task.Delay(3000);
                return true;
            }).Wait();

            return client;
        }

        // This method gets called by the runtime. Use this method to add services to the container.
        public void ConfigureServices(IServiceCollection services)
        {
            var orleansClient = CreateOrleansClient();
            services.AddSingleton<IClusterClient>(orleansClient);

            services.AddSwaggerGen(c =>
            {
                c.SwaggerDoc("v1", new Info { Title = "My API", Version = "v1" });
            });

            services.AddMvc();
        }

Manual Testing with Swagger

We can quickly add a couple of comments on a blog post and retrieve them to see that all this is working:

Note: seems like Swagger recently changed their UI. I liked it a lot better before.

Unit Testing

Dependency injection makes it easy for us to write unit tests. Let’s add a Grains.Tests project (.NET Core Console App), add a reference to the Grains project, and install the following packages:

Install-Package Microsoft.NET.Test.Sdk
Install-Package NUnit
Install-Package NUnit3TestAdapter
Install-Package Moq

Remove the auto-generated Program.cs file and add the following test class instead:

    public class BlogPostGrainTests
    {
        [Test]
        public async Task SaveCommentTest()
        {
            // arrange

            const int blogPostId = 1;

            var fixedDateTime = new DateTime(2018, 4, 29, 18, 28, 33, DateTimeKind.Utc);
            var mockRepo = new Mock<ICommentRepository>(MockBehavior.Strict);
            var mockTimeService = new Mock<ITimeService>(MockBehavior.Strict);

            mockRepo.Setup(x => x.SaveCommentAsync(blogPostId, It.IsAny<StoredComment>()))
                    .Returns(Task.CompletedTask);
            mockTimeService.Setup(x => x.UtcNow)
                           .Returns(fixedDateTime);

            var grain = new BlogPostGrain(mockRepo.Object, mockTimeService.Object);

            const string name = "George";
            const string emailAddress = "george@food.com";
            const string body = "I'm hungry!";

            var comment = new InputComment()
            {
                Name = name,
                EmailAddress = emailAddress,
                Body = body
            };

            // act

            await grain.SaveCommentAsync(blogPostId, comment);

            // assert

            mockRepo.Verify(x => x.SaveCommentAsync(blogPostId, It.Is<StoredComment>(
                c => c.Name == name
                  && c.EmailAddress == emailAddress
                  && c.Body == body
                  && c.Timestamp == fixedDateTime
            )));
        }
    }

This test verifies that the submitted comment was passed on to the store with the generated timestamp. It should pass:

Exercises

We’ve seen a complete example featuring dependency injection. Registering dependencies is easy; most of the effort in this article was around building the example to demonstrate that.

As you can see, you can write unit tests for grains just as you would for any other class, without having to resort to the Orleans TestCluster.

There are a number of ways you can take this further:

  1. Have the grain perform a validation against the email address, and write unit tests for that.
  2. Have the grain retrieve its own ID (removing the need to pass it as a parameter to its methods), and find a way to mock the grain retrieval.
  3. Try dependency injection in the Orleans client.

Getting Organised With Microsoft Orleans 2.0 in .NET Core

In the previous article, “Getting Started with Microsoft Orleans 2.0 in .NET Core“, we saw how to quickly set up a minimal Orleans 2.0 silo and client (in the same application) and run it on Linux thanks to .NET Core.

However, if you’re serious about using Microsoft Orleans in a production environment, your setup won’t be this simple. You’ll need to create an appropriate project structure, introduce reliability, and add certain optimisations. We’ll be covering these in this article. You’ll also want to look into things like clustering providers which are out of scope here.

The source code for this article is the Orleans2GettingOrganised folder in the Gigi Labs BitBucket repository.

Update 30th June 2018: The source code for this article needs a little adjusting, in order to gracefully stop the silo and gracefully close the client. Counterintuitively, directly disposing a silo or client is non-graceful and is generally discouraged.

General Architecture

Before we go on, it is important to understand what the typical components in an Orleans solution look like.

An Orleans cluster consists of a number of silos, which in turn host a number of grains. Part of the Orleans abstraction is that you don’t know where your grains are physically running; this forces you to think distributed first, and also allows Orleans to migrate grains from faulted silos onto healthy ones.

Note: You can run a single-silo cluster, but that would be a single point of failure. You need multiple silos to achieve high availability. A single-silo cluster is typically only used for development and testing.

An Orleans client is used as a gateway between the Orleans cluster and the outside world. The name is actually misleading, because while it is a client to the Orleans cluster, it is typically also a server to external requests. For example, the Orleans client could be a REST API that accepts HTTP requests and interacts with grains in the Orleans cluster accordingly. Or it could be a Console App running as a Windows service with Topshelf. The project type is arbitrary.

Project Structure

The projects in an Orleans 2.0 solution should look something like this:

Purpose Type NuGet package References
Client ASP .NET Core Web API
or Console App
etc.
Microsoft.Orleans.Client Contracts
Silo Console App Microsoft.Orleans.Server Contracts, Grains
Grains Class Library Microsoft.Orleans.Core.Abstractions
Microsoft.Orleans.OrleansCodeGenerator.Build
Contracts
Contracts
(i.e. Interfaces)
Class Library Microsoft.Orleans.Core.Abstractions
Microsoft.Orleans.OrleansCodeGenerator.Build

Instead of clicking through Visual Studio to set this all up every time, we can use the dotnet command to automate this setup. This not only allows us to build this project structure quickly next time, but allows us to set this up on other platforms (e.g. Linux) in an IDE-agnostic manner.

We’ll use the --no-restore switch to prevent restoring packages with every command, which would take ages. We can do a separate dotnet restore at the end once everything is set up.

First, let’s make a folder for the solution:

mkdir Orleans2
cd Orleans2

Set up the Contracts project, which will hold our grain interfaces:

dotnet new classlib --name Contracts --no-restore
dotnet add Contracts/Contracts.csproj package Microsoft.Orleans.Core.Abstractions --no-restore
dotnet add Contracts/Contracts.csproj package Microsoft.Orleans.OrleansCodeGenerator.Build --no-restore

Set up the Grains project:

dotnet new classlib --name Grains --no-restore
dotnet add Grains/Grains.csproj package Microsoft.Orleans.Core.Abstractions --no-restore
dotnet add Grains/Grains.csproj package Microsoft.Orleans.OrleansCodeGenerator.Build --no-restore
dotnet add Grains/Grains.csproj reference Contracts/Contracts.csproj

Set up the Silo project:

dotnet new console --name Silo --no-restore
dotnet add Silo/Silo.csproj package Microsoft.Orleans.Server --no-restore
dotnet add Silo/Silo.csproj package Microsoft.Extensions.Logging.Console --no-restore
dotnet add Silo/Silo.csproj package OrleansDashboard --no-restore
dotnet add Silo/Silo.csproj reference Contracts/Contracts.csproj
dotnet add Silo/Silo.csproj reference Grains/Grains.csproj

Set up the Client project:

dotnet new webapi --name Client --no-restore
dotnet add Client/Client.csproj package Microsoft.Orleans.Client --no-restore
dotnet add Client/Client.csproj package Microsoft.Extensions.Logging.Console --no-restore
dotnet add Client/Client.csproj reference Contracts/Contracts.csproj

Finally, create a solution that includes all the above projects:

dotnet new sln --name Orleans2
dotnet sln Orleans2.sln add Contracts/Contracts.csproj
dotnet sln Orleans2.sln add Grains/Grains.csproj
dotnet sln Orleans2.sln add Silo/Silo.csproj
dotnet sln Orleans2.sln add Client/Client.csproj

Before we proceed, let’s build this solution to make sure it actually works. dotnet build restores packages as part of the build so there’s no need to do a dotnet restore separately.

dotnet build

It will take a little while to go through the restore, build and codegen steps, but it should work:

And there’s no reason why it shouldn’t work on Linux as well:

Setting Up an Example

Before proceeding with other things we need in a proper Orleans 2.0 solution, let’s set up a little example we can work with. This time, we’ll have a GameGrain that keeps track of players in a game. It will support three operations: Join, Leave, and List Players. To keep things simple, the grain will maintain the list of players in memory. This means that the player list won’t survive any failures or grain reactivations.

In the Contracts project, add a grain interface:

    public interface IGameGrain : IGrainWithIntegerKey
    {
        Task JoinAsync(string playerName);
        Task LeaveAsync(string playerName);
        Task<List<string>> ListPlayersAsync();
    }

In the Grains project, add the grain itself:

    public class GameGrain : Grain, IGameGrain
    {
        private HashSet<string> players;

        public GameGrain() => this.players = new HashSet<string>();

        public Task JoinAsync(string playerName)
        {
            this.players.Add(playerName);
            return Task.CompletedTask;
        }

        public Task LeaveAsync(string playerName)
        {
            this.players.Remove(playerName);
            return Task.CompletedTask;
        }

        public Task<List<string>> ListPlayersAsync()
            => Task.FromResult(this.players.ToList());
    }

In the Silo project, our silo startup code will be pretty much the same as in the previous article:

        public static async Task Main(string[] args)
        {
            var siloBuilder = new SiloHostBuilder()
                .UseLocalhostClustering()
                .UseDashboard(options => { })
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2GettingOrganised";
                })
                .Configure<EndpointOptions>(options =>
                    options.AdvertisedIPAddress = IPAddress.Loopback)
                .ConfigureLogging(logging => logging.AddConsole());

            using (var host = siloBuilder.Build())
            {
                await host.StartAsync();

                Console.ReadLine();
            }
        }

Remember that we need at least C# 7.1 to allow async/await in Main().

If you’re targeting Windows, you may want to add Topshelf to make a Windows service out of your silo. However, since this is application-specific, we won’t be covering it here.

The way we set up our Orleans client in the Client project is going to be a bit different from what we did in our previous article, because now we’re dealing with an ASP .NET Core Web API.

We can put the basic client connection code in a new helper method within the Startup class:

        private IClusterClient CreateOrleansClient()
        {
            var clientBuilder = new ClientBuilder()
                .UseLocalhostClustering()
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2GettingOrganised";
                })
                .ConfigureLogging(logging => logging.AddConsole());

            var client = clientBuilder.Build();
            client.Connect().Wait();

            return client;
        }

Note how we’re calling the blocking Wait() instead of doing the usually recommended await when connecting. This is because we’re going to be calling this from the methods in the Startup class, which are synchronous. Not only is there no way to do async in there, but it actually makes sense not to. You want to wait until your services are fully configured before beginning to accept requests.

We can then register the client in the ASP .NET Core IoC container, by adding the following code to the ConfigureServices() method:

        // This method gets called by the runtime. Use this method to add services to the container.
        public void ConfigureServices(IServiceCollection services)
        {
            var orleansClient = CreateOrleansClient();
            services.AddSingleton<IClusterClient>(orleansClient);

            services.AddMvc();
        }

We now need to add a controller that can accept requests and use the Orleans client to interact with the cluster:

    [Produces("application/json")]
    [Route("api/Games")]
    public class GamesController : Controller
    {
        private IClusterClient orleansClient;

        public GamesController(IClusterClient orleansClient)
        {
            this.orleansClient = orleansClient;
        }

        [HttpGet]
        public Task<List<string>> Get(int gameId)
        {
            var grain = this.orleansClient.GetGrain<IGameGrain>(gameId);
            return grain.ListPlayersAsync();
        }

        [HttpPut]
        public async Task Put(int gameId, string playerName)
        {
            var grain = this.orleansClient.GetGrain<IGameGrain>(gameId);
            await grain.JoinAsync(playerName);
        }

        [HttpDelete]
        public async Task Delete(int gameId, string playerName)
        {
            var grain = this.orleansClient.GetGrain<IGameGrain>(gameId);
            await grain.LeaveAsync(playerName);
        }
    }

In order to test this, we need to make sure that the silo has fully started before we start the client. We also need a way to interact with the API. We can add Swagger to the Web API, or use some other tool such as Postman, Fiddler or curl.

It should work nicely:

Client Retries

This is all well and good, but having to wait for the silo to be up before starting the client is silly. This can be tedious and brittle when debugging locally or during deployments. Ideally the client should keep trying to connect to the silo until it is available.

We can do that by putting the client creation and connection code within a loop:

        private IClusterClient CreateOrleansClient()
        {
            while (true) // keep trying to connect until silo is available
            {
                try
                {
                    var clientBuilder = new ClientBuilder()
                        .UseLocalhostClustering()
                        .Configure<ClusterOptions>(options =>
                        {
                            options.ClusterId = "dev";
                            options.ServiceId = "Orleans2GettingOrganised";
                        })
                        .ConfigureLogging(logging => logging.AddConsole());

                    var client = clientBuilder.Build();
                    client.Connect().Wait();

                    return client;
                }
                catch (Exception)
                {
                    Thread.Sleep(3000);
                    // log a warning or something
                }
            }
        }

Now it might seem super weird that we’re going through the hassle of recreating the ClientBuilder, building that into a client, and doing the reconnect, every time. And it is. By some strange design decision, these APIs don’t let you call ClientBuilder.Build() more than once, nor do they let you call Connect() on a client that has already failed. This means that you have to recreate everything with each connection attempt, which is tedious and inefficient.

Also, connection failures result in an OrleansException, which doesn’t really distinguish between different kinds of failures. If you want to distinguish between an intermittent connection failure and some catastrophic event… good luck with that.

Update 23rd April 2018: As a couple of people pointed out on the Orleans gitter chat, an easier way to achieve client retries is to pass a retry delegate to the Connect() method. The following is a simple example of how a fixed-interval retry could be implemented, but such a delegate makes it easy to implement more advanced mechanisms such as exponential backoff.

                    client.Connect(async ex =>
                    {  // replace Console with actual logging
                        Console.WriteLine(ex);
                        Console.WriteLine("Retrying...");
                        await Task.Delay(3000);
                        return true;
                    }).Wait();

Server Garbage Collection

The Orleans documentation recommends configuring .NET garbage collection as an optimisation to get better performance from your silos. In a .NET Core project, this means adding the following two settings to the .csproj file (in the full .NET Framework it’s different):

  <ServerGarbageCollection>true</ServerGarbageCollection>
  <ConcurrentGarbageCollection>true</ConcurrentGarbageCollection>

This should in theory fix the following warnings:

Unfortunately, this doesn’t work at the time of writing this article. Hopefully they’ll fix it sometime soon.

Application Parts

In Orleans 1.x, complaints about silo start times were common. Orleans would scan all the assemblies in the executable’s folder looking for grains, leading to long start times for larger projects. It still does this in Orleans 2.0 by default, but now you can be more explicit and tell it where to look if you want.

Orleans 2.0 introduces something called application parts (based on ASP .NET Core naming, apparently), which is just a really bad way of saying “places from where to load grains”. I’ve already expressed concerns over how unintuitive this part of the API is to work with.

Thankfully, it’s not something you’ll need all the time. You can usually ignore the existence of this feature, and use it only when you notice slow startup times and want to optimise them.

Summary

In this article, we’ve seen a number of things that take us closer towards having a production-ready Orleans setup. These include:

  1. A better project structure.
  2. A Web API serving as a client to the Orleans cluster.
  3. Client retries.
  4. Server garbage collection.
  5. Application parts (grain sources).

As part of all this, we’ve also seen how to automate creation of our Orleans 2.0 solution and projects, and how to interact with an Orleans cluster via a REST API.

We haven’t, however, covered everything you’d typically have in a full solution. Some enhancements you may also need (which are beyond the scope of this article) include:

  • Using Topshelf to install the Client/API as a Windows service (if deploying on Windows). This can also be done for the Silo, if it’s not going to be run under IIS.
  • Configuring actual endpoints rather than using localhost.
  • Adding Swagger to the Client/API (the source code for this article does include it, but we haven’t covered it since I have a separate article on that).
  • Setting up dependency injection.
  • Setting up Orleans clustering (and running multiple silos).

Getting Started with Microsoft Orleans 2.0 in .NET Core

Microsoft Orleans 2.0 was released less than two weeks ago. The biggest win here is .NET Core/Standard support, meaning that Orleans is cross-platform. In this article, we’ll see how to quickly get up and running with Orleans 2.0.

The configuration and hosting APIs have changed considerably, so the instructions here won’t work for earlier versions. See my old “Getting Started with Microsoft Orleans” article from November 2016 if you’re running Orleans 1.4. Orleans 1.5 is also different so you’ll need to check the documentation for that.

In order to keep this article practical and concise, it is necessary to limit its scope. We will not be covering what Orleans is or what it is used for. We will also not create a full project structure that is typical in Orleans solutions. Instead, we’ll keep it simple so that in a short time we have a starting point to explore what Orleans has to offer.

Tip: Use Ctrl+. (Control dot) to resolve namespaces in Visual Studio.

The source code for this article is the Orleans2GettingStarted folder in the Gigi Labs BitBucket repository.

Update 30th June 2018: The source code for this article needs a little adjusting, in order to gracefully stop the silo and gracefully close the client. Counterintuitively, directly disposing a silo or client is non-graceful and is generally discouraged.

.NET Core Project

To demonstrate the fact that Orleans 2.0 really supports .NET Core, we’ll create a .NET Core console app and set up everything in it. To keep things simple, we’ll run both the client and the silo (server) from this same application.

Install Packages

There are a few packages we’ll need:

Install-Package Microsoft.Orleans.Server
Install-Package Microsoft.Orleans.Client
Install-Package Microsoft.Orleans.OrleansCodeGenerator.Build
Install-Package Microsoft.Extensions.Logging.Console
Install-Package OrleansDashboard

Here is a summary of what each of these does:

  1. Used by Orleans silo (server).
  2. Used by Orleans client.
  3. Required for build-time code generation. Bad things happen if you don’t include it.
  4. Optional, but allows us to set up logging to console to see what Orleans is doing.
  5. Optional, but allows us to visualise the operation of silos and grains.

Update 28th April 2018: if you’re running on Windows, you can also get CPU and memory telemetry in the Orleans dashboard by installing the Microsoft.Orleans.OrleansTelemetryConsumers.Counters package.

Grain and Grain Interface

We’ll add a simple grain class to the project in order to have a minimal example. Since grains are independent of each other, Internet of Things (IoT) scenarios fit very nicely. Imagine we have a number of temperature sensors deployed in different places. Each one has an ID, and periodically submits temperature readings to a corresponding grain in the Orleans cluster:

    public class TemperatureSensorGrain : Grain, ITemperatureSensorGrain
    {
        public Task SubmitTemperatureAsync(float temperature)
        {
            long grainId = this.GetPrimaryKeyLong();
            Console.WriteLine($"{grainId} received temperature: {temperature}");

            return Task.CompletedTask;
        }
    }

We’re not doing anything special here. We write out the grain ID and the value we received just so we see something going on in the console. It is important that we inherit from the Grain base class, and that all our methods return Task.

We also need a grain interface:

    public interface ITemperatureSensorGrain : IGrainWithIntegerKey
    {
        Task SubmitTemperatureAsync(float temperature);
    }

The interface must inherit from an Orleans-defined interface that tells what type of grain ID (key) it will use. Our grains will have an ID of type long (that’s some misleading naming in the interface), but there are other options including GUID or string.

Silo

Taking a look at the Hello World sample gives an idea of how to set up minimal silo and client. Since this code is going to be async, we’ll need use C# 7.1+ (to support async/await in Main()) or use a workaround. See the last section of “Working with Asynchronous Methods in C#” for how this is done (quick tip: Project Properties -> Build -> Advanced… -> C# latest minor version (latest)).

We can adapt the code from the Hello World sample to run a simple silo:

        static async Task Main(string[] args)
        {
            var siloBuilder = new SiloHostBuilder()
                .UseLocalhostClustering()
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2GettingStarted";
                })
                .Configure<EndpointOptions>(options =>
                    options.AdvertisedIPAddress = IPAddress.Loopback)
                .ConfigureLogging(logging => logging.AddConsole());

            using (var host = siloBuilder.Build())
            {
                await host.StartAsync();

                Console.ReadLine();
            }
        }

Here, we are setting up a local cluster for development. Thanks to the console logging package we installed earlier and the ConfigureLogging() call above, we can see what Orleans is up to:

What is being written out is not important at this stage. The important thing is that the Orleans silo is running.

Client

The same Hello World sample also shows us how to set up a client that connects to the silo. This usually serves as a gateway between the outside world and the Orleans cluster. It could be a Web API, Windows service, etc; but here it will just be in the same console app as the silo.

We’ll wait to start the client after the silo has started. This is easy to do in our case because both are in the same application.

            using (var host = siloBuilder.Build())
            {
                await host.StartAsync();

                var clientBuilder = new ClientBuilder()
                    .UseLocalhostClustering()
                    .Configure<ClusterOptions>(options =>
                    {
                        options.ClusterId = "dev";
                        options.ServiceId = "Orleans2GettingStarted";
                    })
                    .ConfigureLogging(logging => logging.AddConsole());

                using (var client = clientBuilder.Build())
                {
                    await client.Connect();

                    var sensor = client.GetGrain<ITemperatureSensorGrain>(123);
                    await sensor.SubmitTemperatureAsync(32.5f);

                    Console.ReadLine();
                }
            }

The setup for the client is very similar to that of the silo, and quite straightforward since we are using the default localhost configurations. One thing you’ll notice is the unfortunate inconsistent naming between host.StartAsync() and client.Connect(); the latter lacks the –Async() suffix, even though it also returns a Task.

If we run this code, we see that the code in the grain is getting executed, and we see the temperature reading in the console at the end:

Dashboard

Although it works, this example is really boring. We essentially have a Hello World here, styled for IoT. Let’s change the client code to generate some load instead:

                using (var client = clientBuilder.Build())
                {
                    await client.Connect();

                    var random = new Random();
                    string sky = "blue";

                    while (sky == "blue") // if run in Ireland, it exits loop immediately
                    {
                        int grainId = random.Next(0, 500);
                        double temperature = random.NextDouble() * 40;
                        var sensor = client.GetGrain<ITemperatureSensorGrain>(grainId);
                        await sensor.SubmitTemperatureAsync((float)temperature);
                    }
                }

Now we can see the silo brimming with activity, but only in the console:

Now, wouldn’t it be nice if we could see some graphs showing what our grains and silos are doing? As it turns out, we can do that by setting up the Orleans Dashboard, a community-contributed admin dashboard for Microsoft Orleans.

We’ve already installed the package for it, so all we need to do is add it to the silo configuration:

            var siloBuilder = new SiloHostBuilder()
                .UseLocalhostClustering()
                .UseDashboard(options => { })
                .Configure<ClusterOptions>(options =>
                {
                    options.ClusterId = "dev";
                    options.ServiceId = "Orleans2GettingStarted";
                })
                .Configure<EndpointOptions>(options =>
                    options.AdvertisedIPAddress = IPAddress.Loopback)
                .ConfigureLogging(logging => logging.AddConsole());

That sets up the dashboard with all default values (port 8080, no username/password) which you can always change if you need to.

So now, if we run the application again and open localhost:8080 in a browser window, we can get some pretty visualisations.

Here’s the high-level view of the cluster:

And here’s a view of the grains that are running. You’ll see we have 500 instances of our TemperatureSensorGrain, which corresponds to the range of grainIds we’re generating at random as we generate load. You’ll also see some internal system-related grains:

Here’s a view of the grain itself, and the methods being called on it:

We can also get a view of the silo:

We haven’t covered everything the dashboard gives you, but you can already see that it gives a lot of visibility into what’s going on. It’s great to track errors, slow requests, throughput, etc.

Linux

So now we have Orleans in a .NET Core project on Windows. That’s great, but the real benefit of Orleans supporting .NET Core is cross-platform deployment. So after installing .NET Core on a Linux machine (I’m using Ubuntu 17.10.1 here), let’s grab the code and run Orleans:

git clone https://bitbucket.org/dandago/gigilabs.git
cd gigilabs
cd Orleans2GettingStarted
cd Orleans2GettingStarted
dotnet run

Orleans has no problem starting up on this Ubuntu VM:

And here we can see Orleans running with the Orleans Dashboard in the background:

Summary

Orleans 2.0 is based on .NET Standard 2.0, so Orleans can now run on .NET Core and the full .NET Framework alike. It can be run on any platform capable of running .NET Core, Linux being just one example.

A big thanks goes to the Microsoft Orleans team for making this happen! (And to Richard Astbury for the awesome Orleans Dashboard, which he still claims is alpha quality.)

To recap: in order to have a minimal Orleans sample running, we need to:

  1. Install the necessary packages.
  2. Add a grain and a grain interface.
  3. Set up the silo and client.
  4. Use the grain from the client.
  5. Optionally, set up logging and the Orleans Dashboard.

This example is meant to get you quickly up and running, and does not delve into any proper project structure or optimisations, which you would normally have when building a serious solution around Orleans. In the next Orleans 2.0 article, we’ll see how to properly organise an Orleans 2.0 solution.

Test Explorer Revamped in Visual Studio 15.7 Preview

This article is based on Visual Studio 2017 Version 15.7 Preview 2. Although it’s nice to get a glimpse of good things to come, keep in mind that it’s not production-ready yet and that things may change.

With the current version of Visual Studio 2017, which is 15.6.5, you have to have really good naming to be able to look at Test Explorer and figure out what your tests actually do:

That’s because the tests are grouped based on their outcome, i.e. whether they’ve passed, failed, or not run at all.

But that’s going to change in Visual Studio 2017, where Test Explorer groups tests based on the more logical hierarchy of project, namespace, class and method:

This is also very handy because you can run a certain group of tests (e.g. tests in a specific class, or tests in a specific namespace), directly from Test Explorer, rather than having to do this by right-clicking the test code and using the context menu (which isn’t very intuitive for newer developers).

This is a small but significant improvement towards making Test Explorer more usable. It is also yet another change that brings Visual Studio closer to ReShaper in terms of functionality.

Log Shipping with Filebeat and Elasticsearch

Introduction

Aside from being a powerful search engine, Elasticsearch has in recent years become very popular as a special-purpose logging storage and analysis solution. Logstash and beats were eventually introduced to help Elasticsearch cope better with the volume of logs being ingested.

In this article, we’ll see how to use Filebeat to ship existing logfiles into Elasticsearch, so that they can be viewed and analysed in Kibana.

Since it’s not possible to cover all scenarios exhaustively and keep this article concise and at a reasonable length, we’ll make a few assumptions here:

  1. We’ll use Filebeat on Windows.
  2. We’ll ship logs directly into Elasticsearch, i.e. no Logstash. This is good if the scale of logging is not so big as to require Logstash, or if it is just not an option (e.g. using Elasticsearch as a managed service in AWS).
  3. We’re running on-premises, and already have log files we want to ship. If we were running managed services within the cloud, then logging to file would often not be an option, and in that case we should use whatever logging mechanism is available from the cloud provider.

Motivation

Logging is ubiquitous. You’ll find it in virtually every application out there. As such, it’s a problem that has been solved to death. There are so many logging frameworks out there, it’s just crazy.

And despite this, it baffles me why so many companies today still opt to write their own logging libraries, either from scratch or as abstractions of other logging libraries. They could just use one of the myriad existing solutions out there, which are probably far more robust and performant than theirs will ever be.

In order to realise just how stupid reinventing the wheel is, let’s take an example scenario. You have your big software monolith that’s writing to one or more log files. You begin to break up the monolith into microservices, and realise that you now have log files everywhere: in multiple applications across different servers. So… you need to write a logging library that all your microservices can use to write the logs to a central data store (could be any kind of relational or NoSQL database).

 

That’s great! Your logs are now in one place and you can search through them effortlessly! And your code is even DRY because you wrote another common library (hey, you only need like 35 of them now to write a new microservice).

But wait, having applications write directly to a log store is wrong on so many levels. Here are a few:

  1. Logs buffered in memory can be permanently lost if the application terminates unexpectedly.
  2. The application must take the performance hit of communicating with the remote endpoint.
  3. Through the logging library, the application must depend on a client library for that logging store. This is a form of coupling that doesn’t work very well with microservices. Even worse, if the logging library isn’t designed properly, it may carry dependencies on mutiple logging stores.

These practical issues don’t even take into consideration the effort and complexity involved in creating a fully-featured logging library.

So what is the alternative? Simply keep writing to log files, and have a separate application (a log shipper) send those logs to a centralised store. Again, you don’t have to write the log shipper yourself. There are more than enough out there that you can just pick up and use.

 

This approach has a number of advantages:

  1. The log shipper is an offline process, and will not directly impact performance of applications.
  2. Files are about as fast as it gets for an application to write logs.
  3. If there is a problem sending logs to the store, the original log files are still there as a single source of truth.
  4. The log shipper can send logs to the store in bulk. There is no need to dangerously buffer them in memory. They are already there on disk.
  5. If the original logger (to file) is configured to flush on each write, then it’s virtually impossible that logs will be lost.
  6. There are no additional dependencies for the application. Just the original logging library.
  7. Developers can leverage their knowledge of existing libraries, and don’t have to learn to use a new one every time they start a new job.
  8. Developers can focus on solving real problems, rather than reinventing the wheel.

“But wait!” I can already hear the skeptics. “Existing logging libraries are not fast enough!” goes one of them. To this chap, I say:

  • Have you really tried all existing logging libraries? (Only Chuck Norris has done that, as far as I can tell. Twice.)
  • Is it possible that you’re simply not using a library correctly? (Maybe tweak some configuration settings?)
  • Even if you really could write something faster, it’s likely that the benefit will be negligible, and that it will only be faster under certain conditions. Surely you have more important performance consideratons than how many logs you can write per second.

“But wait!” goes another skeptic. “We might need to change the logging library later.” This is the same tired old excuse that is very common about data-access-layer code. “We might have to change our database!” Some folks still go on after some forty years.

This is a very common over-engineering scenario in which we create an abstraction of an abstraction. NLog and other logging libraries can already plug into a variety of output destinations, so it’s very unlikely that you’ll ever need to change them. Actually, it’s more likely that you’ll run into limitations by using abstractions such as Common.Logging where you end up with a common denominator and can’t make use of advanced features that a specific logging library might offer.

Changing a logging library should be mostly a matter of changing packages, and updating code via search and replace. So if you need to change it, just change it. That’s way cheaper than the complexity introduced by an extra layer of unnecessary abstraction for no other reason than “just in case”. Especially if you’re doing microservices (properly) – you should be able to change your logging library and redeploy in a matter of minutes.

Beats and Filebeat

beat is a lightweight agent that can siphon data from a source and send it to Logstash or Elasticsearch. There are several beats that can gather network data, Windows event logs, log files and more, but the one we’re concerned with here is the Filebeat.

After you download Filebeat and extract the zip file, you should find a configuration file called filebeat.yml. For a quick start, look for filebeat.prospectors, and under it:

  • Change the value of enabled from false to true.
  • Under paths, comment out the existing entry for /var/log/*.log, and instead put in a path for whatever log you’ll test against.

This part of filebeat.yml should now look something like this:

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    #- /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*
    - C:\ConsoleApp1\*.log

Also if your Elasticsearch server isn’t the default localhost:9200, be sure to change it further down in the file.

In that ConsoleApp1, I have a file called Debug.log which contains the following log entries:

2018-03-18 15:43:40.7914 - INFO: Tick
2018-03-18 15:43:42.8215 - INFO: Tock
2018-03-18 15:43:42.8683 - ERROR: Error doing TickTock!
EXCEPTION: System.DivideByZeroException: Attempted to divide by zero.
   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18

I’ll be using this simple (silly) example to show how to work with Filebeat.

Next, we can invoke filebeat.exe. When you do this, two folders get created. One is logs, where you can check Filebeat’s own logs and see if it has run into any problems. The other is data, and I believe this is where Filebeat keeps track of its position in each log file it’s tracking. If you delete this folder, it will go through the log files and ship them again from scratch.

Go into Kibana, and then into Management and Index Patterns. If all went well, Kibana will find the index that was created by Filebeat. You can create the index pattern filebeat-* to capture all Filebeat data:

For the time filter field, choose @timestamp, which is created and populated automatically by Filebeat.

In Kibana, you can now go back to Discover and see the log data (you may need to extend the time range):

As you can see, Filebeat successfully shipped the logs into Elasticsearch, but the logs haven’t been meaningfully parsed:

  • The message field contains everything, including timestamp, log level and actual message.
  • The exception stack trace was split into different entries per line.
  • The Time field showing in Kibana is actually the time when the log was shipped, not the timestamp of the log entry itself.

We’ll deal with these issues in the next sections.

Elasticsearch Pipeline

One way to properly parse the logs when they are sent to Elasticsearch is to create an ingest pipeline in Elasticsearch itself. There’s a good article by James Huang showing how to use this to ship logs from Filebeats to managed Elasticsearch in AWS.

By adapting the example in that article, we can create a pipeline for our sample log file. Run the following in Kibana’s Dev Tools:

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      }
    }
  ]
}

Now, getting that pattern right is a pain in the ass. The Grok Debugger is a great help, and there’s also a list of data types you can use.

In filebeat.yml, we now need to configure Filebeat to use this Elasticsearch pipeline:

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: logpipeline

We can now try indexing the logs again. First, let’s delete the Filebeat index:

DELETE filebeat-*

Next, delete the Filebeat’s data folder, and run filebeat.exe again.

In Discover, we now see that we get separate fields for timestamp, log level and message:

If you get warnings on the new fields (as above), just go into Management, then Index Patterns, and refresh the filebeat-* index pattern.

Now, you’ll see that for the error entry, we did not get the full exception stack trace. If we go into the Filebeat logs, we can see something like this:

2018-03-18T23:16:26.614Z	ERROR	pipeline/output.go:92	Failed to publish events: temporary bulk send failure
2018-03-18T23:16:26.616Z	INFO	elasticsearch/client.go:690	Connected to Elasticsearch version 6.1.2
2018-03-18T23:16:26.620Z	INFO	template/load.go:73	Template already exists and will not be overwritten.
2018-03-18T23:16:27.627Z	ERROR	pipeline/output.go:92	Failed to publish events: temporary bulk send failure
2018-03-18T23:16:27.629Z	INFO	elasticsearch/client.go:690	Connected to Elasticsearch version 6.1.2
2018-03-18T23:16:27.635Z	INFO	template/load.go:73	Template already exists and will not be overwritten.

Correspondingly, in Elasticsearch we can see several errors such as the following accumulating:

[2018-03-18T23:16:25,610][DEBUG][o.e.a.b.TransportBulkAction] [8vLF54_] failed to execute pipeline [logpipeline] for document [filebeat-6.2.2-2018.03.18/doc/null]
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:156) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:107) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.2.jar:6.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	... 11 more
Caused by: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	at org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:67) ~[?:?]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100) ~[elasticsearch-6.1.2.jar:6.1.2]
	... 9 more

Elasticsearch is making a fuss because it can’t parse the lines from the exception. This is a problem because if Elasticsearch can’t parse the logs, Filebeat will keep trying to send them and never make progress. We’ll have to deal with that exception stack trace now.

Multiline log entries

In order to log the exception correctly, we have to enable multiline processing in Filebeat. In filebeat.yml, there are some multiline settings that are commented out. We need to enable them and change them a little, such that any line not starting with a date is appended to the previous line:

  ### Multiline options

  # Mutiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  multiline.pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{4}'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  multiline.match: after

Configuring the Filebeat to support multiline log entries is not enough though. We also need to update the pipeline in Elasticsearch to apply the grok filter on multiple lines ((?m)) and to separate the exception into a field of its own. I’ve had to split the two cases (with and without exception) into separate patterns in order to make it work.

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?",
            "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      }
    }
  ]
}

After deleting the index and the Filebeat data folder, and re-running Filebeat, we now get a perfect multiline exception stack trace in its own field!

Fixing the Timestamp

We now have one last issue to fix: the logs being ordered by when they were inserted into the index, rather than the log timestamp. This is actually a pretty serious problem from a usability perspective, because it means people troubleshooting production issues won’t be able to use Kibana’s time filter (e.g. last 15 minutes) to home in on the most relevant logs.

In order to fix this, we need to augment our pipeline with a date processor:

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?",
        "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      },
      "date" : {
        "field" : "timestamp",
        "target_field" : "@timestamp",
        "formats" : ["yyyy-MM-dd HH:mm:ss.SSSS"]
      }
    }
  ]
}

The names of the fields in the date section are important. We’re basically telling it to take whatever is in the timestamp field (based on one of the earlier patterns) and apply it to @timestamp. As it happens, @timestamp is what is being used as the time-series field, which gives us exactly the result we want after reshipping the logs (be sure to extend the time window in Kibana accordingly to see the logs):

Summary

In this article, we’ve explored log shipping to augment regular file logging with purpose-built tools, rather than reinventing the wheel and writing yet another logging library. The latter approach would not only be a tremendous waste of time, but there are reliability, performance and maintainability implications to consider.

We have specifically looked at using Filebeat to ship logs directly into Elasticsearch, which is a good approach when Logstash is either not necessary or not possible to have. In order to get our log data nicely structured so that we can analyse it in Kibana, we’ve had to set up an ingest pipeline in Elasticsearch.

We progressively refined both our Filebeat configuration and this pipeline in order to split up our logs into separate fields, process multiline exception stack traces, and use the original timestamp in the logs as the time series field.

There is a lot more that Filebeats can do. For instance, a Filebeat may be configured with multiple prospectors, meaning it can read log files from different places and apply different options accordingly. One useful example of this is to add a custom field indicating the origin of the logs – this is useful when the log data itself does not include the application name, for instance.

"You don't learn to walk by following rules. You learn by doing, and by falling over." — Richard Branson