Log Shipping with Filebeat and Elasticsearch

Introduction

Aside from being a powerful search engine, Elasticsearch has in recent years become very popular as a special-purpose logging storage and analysis solution. Logstash and beats were eventually introduced to help Elasticsearch cope better with the volume of logs being ingested.

In this article, we’ll see how to use Filebeat to ship existing logfiles into Elasticsearch, so that they can be viewed and analysed in Kibana.

Since it’s not possible to cover all scenarios exhaustively and keep this article concise and at a reasonable length, we’ll make a few assumptions here:

  1. We’ll use Filebeat on Windows.
  2. We’ll ship logs directly into Elasticsearch, i.e. no Logstash. This is good if the scale of logging is not so big as to require Logstash, or if it is just not an option (e.g. using Elasticsearch as a managed service in AWS).
  3. We’re running on-premises, and already have log files we want to ship. If we were running managed services within the cloud, then logging to file would often not be an option, and in that case we should use whatever logging mechanism is available from the cloud provider.

Motivation

Logging is ubiquitous. You’ll find it in virtually every application out there. As such, it’s a problem that has been solved to death. There are so many logging frameworks out there, it’s just crazy.

And despite this, it baffles me why so many companies today still opt to write their own logging libraries, either from scratch or as abstractions of other logging libraries. They could just use one of the myriad existing solutions out there, which are probably far more robust and performant than theirs will ever be.

In order to realise just how stupid reinventing the wheel is, let’s take an example scenario. You have your big software monolith that’s writing to one or more log files. You begin to break up the monolith into microservices, and realise that you now have log files everywhere: in multiple applications across different servers. So… you need to write a logging library that all your microservices can use to write the logs to a central data store (could be any kind of relational or NoSQL database).

 

That’s great! Your logs are now in one place and you can search through them effortlessly! And your code is even DRY because you wrote another common library (hey, you only need like 35 of them now to write a new microservice).

But wait, having applications write directly to a log store is wrong on so many levels. Here are a few:

  1. Logs buffered in memory can be permanently lost if the application terminates unexpectedly.
  2. The application must take the performance hit of communicating with the remote endpoint.
  3. Through the logging library, the application must depend on a client library for that logging store. This is a form of coupling that doesn’t work very well with microservices. Even worse, if the logging library isn’t designed properly, it may carry dependencies on mutiple logging stores.

These practical issues don’t even take into consideration the effort and complexity involved in creating a fully-featured logging library.

So what is the alternative? Simply keep writing to log files, and have a separate application (a log shipper) send those logs to a centralised store. Again, you don’t have to write the log shipper yourself. There are more than enough out there that you can just pick up and use.

 

This approach has a number of advantages:

  1. The log shipper is an offline process, and will not directly impact performance of applications.
  2. Files are about as fast as it gets for an application to write logs.
  3. If there is a problem sending logs to the store, the original log files are still there as a single source of truth.
  4. The log shipper can send logs to the store in bulk. There is no need to dangerously buffer them in memory. They are already there on disk.
  5. If the original logger (to file) is configured to flush on each write, then it’s virtually impossible that logs will be lost.
  6. There are no additional dependencies for the application. Just the original logging library.
  7. Developers can leverage their knowledge of existing libraries, and don’t have to learn to use a new one every time they start a new job.
  8. Developers can focus on solving real problems, rather than reinventing the wheel.

“But wait!” I can already hear the skeptics. “Existing logging libraries are not fast enough!” goes one of them. To this chap, I say:

  • Have you really tried all existing logging libraries? (Only Chuck Norris has done that, as far as I can tell. Twice.)
  • Is it possible that you’re simply not using a library correctly? (Maybe tweak some configuration settings?)
  • Even if you really could write something faster, it’s likely that the benefit will be negligible, and that it will only be faster under certain conditions. Surely you have more important performance consideratons than how many logs you can write per second.

“But wait!” goes another skeptic. “We might need to change the logging library later.” This is the same tired old excuse that is very common about data-access-layer code. “We might have to change our database!” Some folks still go on after some forty years.

This is a very common over-engineering scenario in which we create an abstraction of an abstraction. NLog and other logging libraries can already plug into a variety of output destinations, so it’s very unlikely that you’ll ever need to change them. Actually, it’s more likely that you’ll run into limitations by using abstractions such as Common.Logging where you end up with a common denominator and can’t make use of advanced features that a specific logging library might offer.

Changing a logging library should be mostly a matter of changing packages, and updating code via search and replace. So if you need to change it, just change it. That’s way cheaper than the complexity introduced by an extra layer of unnecessary abstraction for no other reason than “just in case”. Especially if you’re doing microservices (properly) – you should be able to change your logging library and redeploy in a matter of minutes.

Beats and Filebeat

beat is a lightweight agent that can siphon data from a source and send it to Logstash or Elasticsearch. There are several beats that can gather network data, Windows event logs, log files and more, but the one we’re concerned with here is the Filebeat.

After you download Filebeat and extract the zip file, you should find a configuration file called filebeat.yml. For a quick start, look for filebeat.prospectors, and under it:

  • Change the value of enabled from false to true.
  • Under paths, comment out the existing entry for /var/log/*.log, and instead put in a path for whatever log you’ll test against.

This part of filebeat.yml should now look something like this:

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    #- /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*
    - C:\ConsoleApp1\*.log

Also if your Elasticsearch server isn’t the default localhost:9200, be sure to change it further down in the file.

In that ConsoleApp1, I have a file called Debug.log which contains the following log entries:

2018-03-18 15:43:40.7914 - INFO: Tick
2018-03-18 15:43:42.8215 - INFO: Tock
2018-03-18 15:43:42.8683 - ERROR: Error doing TickTock!
EXCEPTION: System.DivideByZeroException: Attempted to divide by zero.
   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18

I’ll be using this simple (silly) example to show how to work with Filebeat.

Next, we can invoke filebeat.exe. When you do this, two folders get created. One is logs, where you can check Filebeat’s own logs and see if it has run into any problems. The other is data, and I believe this is where Filebeat keeps track of its position in each log file it’s tracking. If you delete this folder, it will go through the log files and ship them again from scratch.

Go into Kibana, and then into Management and Index Patterns. If all went well, Kibana will find the index that was created by Filebeat. You can create the index pattern filebeat-* to capture all Filebeat data:

For the time filter field, choose @timestamp, which is created and populated automatically by Filebeat.

In Kibana, you can now go back to Discover and see the log data (you may need to extend the time range):

As you can see, Filebeat successfully shipped the logs into Elasticsearch, but the logs haven’t been meaningfully parsed:

  • The message field contains everything, including timestamp, log level and actual message.
  • The exception stack trace was split into different entries per line.
  • The Time field showing in Kibana is actually the time when the log was shipped, not the timestamp of the log entry itself.

We’ll deal with these issues in the next sections.

Elasticsearch Pipeline

One way to properly parse the logs when they are sent to Elasticsearch is to create an ingest pipeline in Elasticsearch itself. There’s a good article by James Huang showing how to use this to ship logs from Filebeats to managed Elasticsearch in AWS.

By adapting the example in that article, we can create a pipeline for our sample log file. Run the following in Kibana’s Dev Tools:

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      }
    }
  ]
}

Now, getting that pattern right is a pain in the ass. The Grok Debugger is a great help, and there’s also a list of data types you can use.

In filebeat.yml, we now need to configure Filebeat to use this Elasticsearch pipeline:

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: logpipeline

We can now try indexing the logs again. First, let’s delete the Filebeat index:

DELETE filebeat-*

Next, delete the Filebeat’s data folder, and run filebeat.exe again.

In Discover, we now see that we get separate fields for timestamp, log level and message:

If you get warnings on the new fields (as above), just go into Management, then Index Patterns, and refresh the filebeat-* index pattern.

Now, you’ll see that for the error entry, we did not get the full exception stack trace. If we go into the Filebeat logs, we can see something like this:

2018-03-18T23:16:26.614Z	ERROR	pipeline/output.go:92	Failed to publish events: temporary bulk send failure
2018-03-18T23:16:26.616Z	INFO	elasticsearch/client.go:690	Connected to Elasticsearch version 6.1.2
2018-03-18T23:16:26.620Z	INFO	template/load.go:73	Template already exists and will not be overwritten.
2018-03-18T23:16:27.627Z	ERROR	pipeline/output.go:92	Failed to publish events: temporary bulk send failure
2018-03-18T23:16:27.629Z	INFO	elasticsearch/client.go:690	Connected to Elasticsearch version 6.1.2
2018-03-18T23:16:27.635Z	INFO	template/load.go:73	Template already exists and will not be overwritten.

Correspondingly, in Elasticsearch we can see several errors such as the following accumulating:

[2018-03-18T23:16:25,610][DEBUG][o.e.a.b.TransportBulkAction] [8vLF54_] failed to execute pipeline [logpipeline] for document [filebeat-6.2.2-2018.03.18/doc/null]
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:156) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:107) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.2.jar:6.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	... 11 more
Caused by: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	at org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:67) ~[?:?]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100) ~[elasticsearch-6.1.2.jar:6.1.2]
	... 9 more

Elasticsearch is making a fuss because it can’t parse the lines from the exception. This is a problem because if Elasticsearch can’t parse the logs, Filebeat will keep trying to send them and never make progress. We’ll have to deal with that exception stack trace now.

Multiline log entries

In order to log the exception correctly, we have to enable multiline processing in Filebeat. In filebeat.yml, there are some multiline settings that are commented out. We need to enable them and change them a little, such that any line not starting with a date is appended to the previous line:

  ### Multiline options

  # Mutiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  multiline.pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{4}'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  multiline.match: after

Configuring the Filebeat to support multiline log entries is not enough though. We also need to update the pipeline in Elasticsearch to apply the grok filter on multiple lines ((?m)) and to separate the exception into a field of its own. I’ve had to split the two cases (with and without exception) into separate patterns in order to make it work.

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?",
            "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      }
    }
  ]
}

After deleting the index and the Filebeat data folder, and re-running Filebeat, we now get a perfect multiline exception stack trace in its own field!

Fixing the Timestamp

We now have one last issue to fix: the logs being ordered by when they were inserted into the index, rather than the log timestamp. This is actually a pretty serious problem from a usability perspective, because it means people troubleshooting production issues won’t be able to use Kibana’s time filter (e.g. last 15 minutes) to home in on the most relevant logs.

In order to fix this, we need to augment our pipeline with a date processor:

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?",
        "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      },
      "date" : {
        "field" : "timestamp",
        "target_field" : "@timestamp",
        "formats" : ["yyyy-MM-dd HH:mm:ss.SSSS"]
      }
    }
  ]
}

The names of the fields in the date section are important. We’re basically telling it to take whatever is in the timestamp field (based on one of the earlier patterns) and apply it to @timestamp. As it happens, @timestamp is what is being used as the time-series field, which gives us exactly the result we want after reshipping the logs (be sure to extend the time window in Kibana accordingly to see the logs):

Summary

In this article, we’ve explored log shipping to augment regular file logging with purpose-built tools, rather than reinventing the wheel and writing yet another logging library. The latter approach would not only be a tremendous waste of time, but there are reliability, performance and maintainability implications to consider.

We have specifically looked at using Filebeat to ship logs directly into Elasticsearch, which is a good approach when Logstash is either not necessary or not possible to have. In order to get our log data nicely structured so that we can analyse it in Kibana, we’ve had to set up an ingest pipeline in Elasticsearch.

We progressively refined both our Filebeat configuration and this pipeline in order to split up our logs into separate fields, process multiline exception stack traces, and use the original timestamp in the logs as the time series field.

There is a lot more that Filebeats can do. For instance, a Filebeat may be configured with multiple prospectors, meaning it can read log files from different places and apply different options accordingly. One useful example of this is to add a custom field indicating the origin of the logs – this is useful when the log data itself does not include the application name, for instance.

Object Oriented Programming in Python

This article is a practical overview of Object Oriented Programming (OOP) in Python. It explains why OOP is useful, aside from how it’s done. This should be useful to both people who don’t know what OOP is, and experienced developers transitioning from other languages.

I am not a professional Python developer, and I am currently re-learning the language after not having used it for 8 years. So keep that in mind as you read, and feel free to offer feedback that can improve the quality of this article. Just be nice. 🙂

Due to the eternal divide between Python 2 and 3, I have to state that I’m using Python 3.6.4 here. Why Python 3? Because it makes no difference to me. When you are just learning and don’t have any requirements for maintaining backwards compatibility, you can afford to use the latest and greatest.

Introduction

In his hysterical rant on the web and OOP (in which he says the word “bizarre” enough times to qualify as a cover of OMC’s song), Zed Shaw cites OOP being “difficult to teach” as one of its major flaws.

Image credit: taken from here.

That’s a bold claim coming from someone who wrote in his own book:

“Search online for “object-oriented programming” and try to overflow your brain with what you read. Don’t worry if it makes absolutely no sense to you. Half of that stuff makes no sense to me either.” — Learn Python the Hard Way, Third Edition. Zed A. Shaw. 2014.

There are many things in computing that are hard to teach. I don’t think that Object Oriented Programming is one of them.

Motivation

In order to understand why OOP is useful, we’ll start off by not using it, and observe the problems we encounter. To do this, we need a proper example. People often teach OOP in terms of animals or cars, but I think games make more fun and interesting examples.

Screenshot from Dark Sun: Shattered Lands (1993)

A player-controlled character in a game typically has a number of attributes (e.g. name, hit points, etc). In order to group the attributes for our character, we need some kind of record or structure, which in C-style languages would be a struct. We don’t have that in Python, but we can use dictionaries instead.

talorus = {
    'name': 'Talorus',
    'hitpoints': 30,
    'dead': False,
    'inventory': []
}

Once we have a way to hold related data, we’ll want to perform some kind of operations on it.

def rename(character, newName):
    character['name'] = newName

def sufferDamage(character, damage):
    character['hitpoints'] -= damage
    if (character['hitpoints'] <= 0):
        character['dead'] = True

def receiveItem(character, item):
    character['inventory'].append(item)

Here’s some example usage:

You’ll notice a common theme across these functions. In all cases, we’re passing our character as the first parameter, and then using some of its attributes within the body of each function. We’re not using OOP yet, but we can already see a natural progression towards the character object being a first class citizen.

However, our current approach has a number of flaws. One of these is that it is easy for any code, anywhere, to tamper with our dictionary’s state.

Our logic from the sufferDamage() function specifies that characters die only if they run out of hitpoints, so how is our character dead with 26 hitpoints?

Being able to tamper with an object’s state without restriction is a bad thing: it is a violation of encapsulation, which is one of the three pillars of OOP (along with inheritance and polymorphism). We’ll discuss these later.

Classes and Objects

A class is just an abstract template for a type of object. For instance:

class Troll:
    pass

We’re declaring a Troll class, and using the pass keyword to indicate that there’s nothing in it for the time being. Once we have this class, then we can create concrete instances:

tom = Troll()
bert = Troll()
bill = Troll()

In Python, we create instances of a class (i.e. objects) by calling the class name as if it were a function.

An object may have any number of attributes (data members), just like the elements in a dictionary, but accessed using dot notation. Since Python is a dynamic language, it poses no restriction on the attributes that a class must have. We can add and remove attributes on the fly:

A class may define functions (called methods) that operate on an instance of the class:

class Character:

    def setName(self, newName):
        self.name = newName

This might look a bit weird, so let’s see some example usage and then discuss what we’re doing here:

The structure of the method might be familiar from the earlier section where we emulated OOP with dictionaries. In this case, we are similarly passing in the object itself as the first parameter, named self by convention. This extra parameter is required by Python. Through self, we can then access attributes of the class using dot notation.

What might look really strange here is that although setName() takes two parameters, we’re calling it with one. That’s because the self parameter is passed in implicitly when you call a method.

Constructors

A class may define a special method called __init__() which serves as the class’s constructor. It is usually used to initialise the object’s attributes, and may optionally take parameters which must be supplied when the object is instantiated:

class Character:

    def __init__(self, name, hitPoints):
        self.name = name
        self.hitPoints = hitPoints
        self.dead = False
        self.inventory = []

    def setName(self, newName):
        name = newName

Class-Level Variables

Screenshot from Ravenloft: Stone Prophet (1995)

A class may define variables within its scope:

class Monster:

    totalMonsters = 0

    def __init__(self, name, immortal):
        self.immortal = immortal
        Monster.totalMonsters += 1

Such class-level variables are not attributes of individual objects. They are shared across all instances of the class, just like static member variables in other languages. The distinction should be clear when you see that you access object attributes using self and class attributes using the name of the class itself. In this example, the shared totalMonsters counter is incremented every time a new monster is created:

Composition

Screenshot from Dark Sun: Shattered Lands (1993)

In the real world, complex objects are made up (composed) of other objects. The classic example is that a car has an engine (among other parts), but I prefer to stick to the game example. So let’s say we develop our inventory beyond a simple list, and make it into its own class:

class Inventory:

    def __init__(self):
        self.items = []

    def add(self, item):
        self.items.append(item)

    def has(self, item):
        return item in self.items

While this is a trivial implementation, it can be extended to support more complex operations.

We can now change our Character class to contain the new Inventory class:

class Character:

    def __init__(self, name, hitPoints):
        self.name = name
        self.hitPoints = hitPoints
        self.dead = False
        self.inventory = Inventory()

    def setName(self, newName):
        name = newName

Composition is used to model a has-a relationship (e.g. Character has an Inventory). As you can see, it’s nothing special. It’s merely a case of a class (e.g. Character) having an attribute whose type is also a class (e.g. Inventory).

Inheritance

Screenshot from Ultima 9: Ascension (1999)

A sword is a very common weapon in games. We can represent a simple sword by the following class:

class Sword:

    def __init__(self):
        self.damage = 10

    def attack(self, target):
        print('%d damage done to %s' % (self.damage, target))

Here’s an example usage:

However, there isn’t just one type of sword across all games in existence. Many games have magical swords with all sorts of positive (and negative) effects. One example is a fire sword. It does extra fire damage.

class FireSword:

    def __init__(self):
        self.damage = 10
        self.fireDamage = 5

    def attack(self, target):
        print('%d damage done to %s' % (self.damage, target))
        print('%d extra fire damage done to %s' % (self.fireDamage, target))

As you can see, there’s a lot of repetition here. If we also add classes for lightning swords, poison daggers etc, do we really want to duplicate this code and have to maintain it in several different places?

Fortunately, OOP allows us to create classes that inherit from others.

class FireSword (Sword):
    pass

The above code states that FireSword is-a Sword, and as a result, it inherits all of Sword‘s attributes and methods:

However, while we are reusing Sword‘s implementation for FireSword, we don’t yet have the extra functionality (i.e. extra fire damage) that makes it a fire sword, as we had in the original example. In order to do that, we must override Sword‘s methods to provide the extra functionality.

class FireSword (Sword):

    def __init__(self):
        super().__init__()
        self.fireDamage = 5

    def attack(self, target):
        super().attack(target)
        print('%d extra fire damage done to %s' % (self.fireDamage, target))

Here’s an example usage:

By calling super(), we’re calling the Sword class’s implementation before doing the extra logic specific to FireSword. In OOP terminology, Sword is the base class, parent class or superclass, and FireSword is the derived class or child class.

When you request an attribute or call a method on a derived class, Python will first look for an implementation in the derived class, and if it’s not there, it will look it up in the base class. This mechanism is what enables inheritance. However, it is also possible to have a method in the derived class to replace or extend the equivalent method in the base class, as we have seen above.

In other OOP languages, methods must usually be marked as virtual to allow them to be overridden. This is not necessary in Python.

“For C++ programmers: all methods in Python are effectively virtual.” — The Python Tutorial – Classes

Python allows a class to inherit from more than one base class. This is known as multiple inheritance, and is strongly discouraged because it makes classes extremely hard to work with. More modern OOP languages such as Java and C# expressly forbid multiple inheritance.

As a humorous aside, if you have a copy of Zed Shaw’s “Learn Python the Hard Way” book, you might want to read his section on “Inheritance vs Composition” for laughs. Shaw wastes almost a whole page with a silly story about a forest and an evil queen, which are supposed to be analogies for inheritance and multiple inheritance. His argument is that inheritance is bad because multiple inheritance is troublesome. That’s a bit like saying we should ban fire because some idiot got burned.

“In object-oriented programming, inheritance is the evil forest. Experienced programmers know to avoid this evil because they know that deep inside the dark forest of inheritance is the evil queen, multiple inheritance. She likes to eat software and programmers with her massive complexity teeth, chewing on the flesh of the fallen. But the forest is so powerful and so tempting that nearly every programmer has to go into it and try to make it out alive with the evil queen’s head before they can call themselves real programmers. You just can’t resist the inheritance forest’s pull, so you go in. After the adventure, you learn to just stay out of that stupid forest and bring an army if you are ever forced to go in again.” — Learn Python the Hard Way, Third Edition. Zed A. Shaw. 2014.

Shaw suggests that inheritance should be avoided, and composition should be used instead. For him, the choice between “inheritance versus composition comes down to an attempt to solve the problem of reusable code”. Unfortunately, he misses the point entirely. The main benefit of OOP is to model objects and their relationships. Inheritance models an is-a relationship, whereas composition models a has-a relationship. Code reuse is a practical benefit of both, but does not make them interchangeable.

Encapsulation

In the Motivation section towards the beginning of this article, we saw how emulating OOP with dictionaries results in a situation where the internal state of our classes can be tampered with. Let’s revisit that example, but with OOP:

class Character:

    def __init__(self, name, hitPoints):
        self.name = name
        self.hitPoints = hitPoints
        self.dead = False

    def sufferDamage(self, damage):
        self.hitPoints -= damage
        if (self.hitPoints <= 0):
            self.dead = True

Unfortunately, OOP in Python doesn’t do much to protect our internal state, and we can still tamper with it without restriction:

Other OOP languages usually have private, protected and public access modifiers to control access to internal data members of the class; these are enforced by the language. There is none of this in Python. The only thing you can do is follow a convention where private attributes are prefixed by an underscore, and hope that people play fair. It doesn’t stop people from accessing internal state.

Hiding the internal state of a class is called encapsulation. One strong reason why it is important is, as we’ve just seen, to ensure the consistency of that internal state (dead with 255 hit points? huh?). Another reason is to be able to modify the way that state works, without external code being affected.

So right now, we have an attribute called dead (or _dead, if we’re making it private by convention). Let’s add a method that exposes it:

class Character:

    def __init__(self, name, hitPoints):
        self._name = name
        self._hitPoints = hitPoints
        self._dead = False

    def sufferDamage(self, damage):
        self._hitPoints -= damage
        if (self._hitPoints <= 0):
            self._dead = True

    def isDead(self):
        return self._dead

Code external to this class may now check whether the character is dead by calling the isDead() method, and should not access _dead directly:

xanathar.isDead()

This extra method gives us a lot of flexibility because external code does not get to see how we store our internal state. We could, for instance, replace our _dead attribute with a computation based on _hitPoints, and the external code would never know the difference:

    def isDead(self):
        return self._hitPoints <= 0

So while in Python you can’t force external code not to touch a class’s internal state (as other OOP languages usually do), it is good practice to hide internal state using the available conventions, and expose only what needs to be exposed.

Polymorphism

Image credit: screenshot of Ultima 7: The Black Gate (1992) using Exult, taken from Let’s Play Archive entry

Typically, a person in a game can talk:

class Person:

    def Talk(self):
        print('Hello!')

Sometimes, though, an item can also talk.

class BlackSword:

    def Talk(self):
        print('Which of my powers dost thou seek to use?')

Animals, too, may surprise you with their gift of speech.

class SherryTheMouse:

    def Talk(self):
        print('Do you have any cheese?')

So here we have three completely unrelated classes, but they all have the same ability: we can call the Talk() method. When different objects exhibit similar behaviour, and thus we can work with them in a consistent manner, it’s called Polymorphism.

This is useful, for instance, when iterating over different kinds of objects in a loop:

This is unusual in the world of OOP, but since Python uses duck typing, it’s enough that two classes have the same method signature so that you can use them in the same way. In more strongly-typed OOP languages such as C# or Java, the classes would need to have something in common for you to do this (e.g. they implement the same interface, or they share a common base class).

Generics

This section is for developers coming from OOP in other languages. If you’re new to OOP, you may skip it.

Sometimes, you want to make a class have the same behaviour with different data types. For instance, you create a class representing a stack, and it should work the same regardless of whether it’s a stack of integers or of strings.

C++ provides this through templates, and C# and Java provide generics. These are a way to generalise the class implementation across dependent types, while still enforcing type safety.

Since Python is a dynamic language and it does not care what types you use, generics are not necessary. Your stack (or whatever) class will work just as will with integers, strings, or Animals (although I don’t recommend putting elephants at the top of the stack).

Summary

In this article, we’ve covered the basics of OOP in Python.

  • Even if you’re not currently doing OOP, you’ll notice that groups of variables and functions will tend to relate to the same entity. There is a natural tendency towards OOP.
  • Classes are groups of attributes and functions (methods). They provide a template but are not concrete.
  • Objects are concrete instances of classes. Person is a class. Joe is an object.
  • A constructor allows you to initialise attributes and pass in any parameters at instantiation time.
  • Class-level variables are shared across all instances of that class.
  • Composition is when a class contains other classes. It expresses a has-a relationship.
  • Inheritance expresses an is-a relationship. If FireSword is-a Sword, then FireSword inherits all of Sword’s attributes and methods, and may override those methods to provide more specialised variants.
  • Encapsulation is hiding internal attributes of a class so that external code can’t change them, and so that internal code can be changed without affecting external code. This is not enforced by the language but is upheld by convention.
  • Polymorphism is when different objects behave in a similar way. In Python, it works as a result of duck typing.
  • Generics aren’t necessary in a language without type safety.

This material includes basic concepts and language syntax, but is merely a starting point.

Mastering OOP is a matter of understanding that it is all about abstraction, and learning to work with abstractions in a way that is beneficial to software (e.g. models problem domains, maximises code reuse, reduces coupling, increases maintainability etc). The three pillars of OOP (inheritance, encapsulation and polymorphism) are basic building blocks of such abstraction. Various design patterns have emerged which demonstrate OOP abstractions put to good use.

Avoid await in Foreach

Five months ago, I wrote my C# Asynchronous Programming series, and part of that was an article about Common Mistakes in Asynchronous Programming with .NET. As it turns out, I missed a really common mistake in that article.

        public async Task RunAsync()
        {
            foreach (var x in new[] { 1, 2, 3 })
            {
                await DoSomethingAsync(x);
            }
        }

Whenever I see an await within a foreach (or other looping construct), I tend to get suspicious. That’s because a lot of the time, an external service (Web API, Redis, or whatever) is called repeatedly with different parameters. Imagine that DoSomethingAsync() in the above code performs an HTTP GET request, passing in x as a querystring parameter.

The above could potentially be optimised to run the requests in parallel, as described in Patterns for Asynchronous Composite Tasks in C#. But since each asynchronous call is being awaited, this has the effect of waiting for each request-response cycle to complete before starting the next one.

To illustrate the point, we can implement DoSomethingAsync() as a simple delay:

        private async Task DoSomethingAsync(int x)
        {
            Console.WriteLine($"Doing {x}... ({DateTime.Now :hh:mm:ss})");
            await Task.Delay(2000);
            Console.WriteLine($"{x} done.    ({DateTime.Now :hh:mm:ss})");
        }

Let’s run that:

That’s six seconds just to run three two-second delays, which did not depend on each other and which thus could have been run in parallel. In fact, let’s now change the code to do that:

        public async Task RunAsync()
        {
            var tasks = new List<Task>();

            foreach (var x in new[] { 1, 2, 3 })
            {
                var task = DoSomethingAsync(x);
                tasks.Add(task);
            }

            await Task.WhenAll();
        }

…and run it again:

That’s 2-3 seconds, which is much better. Note though that the operations have completed in a different order from that in which they were started; this is why it’s important that they don’t depend on each other.

Do you think that’s a lot of code? No problem. We can make it more concise, with some help from LINQ.

        public async Task RunAsync()
        {
            var tasks = new[] { 1, 2, 3 }.Select(DoSomethingAsync);
            await Task.WhenAll(tasks);
        }

Having said all this, it is not good to be hasty and declare war against all awaits in foreaches, because there are indeed legitimate cases for that. One example is when you have a list of commands which conform to the same interface, and they must be executed in sequence. This is perfectly fine:

        public interface ICommand
        {
            Task ExecuteAsync();
        }

        public async Task ExecuteAsync(List<ICommand> commands)
        {
            foreach (var command in commands)
            {
                await command.ExecuteAsync();
            }
        }

When order of operations is important, running this sort of scenario in parallel can yield unexpected results.

My hope is that this will at least help to quickly identify potential performance bottlenecks due to an improper use of asynchrony. await in foreach should be eyed suspiciously as a potential code smell, but as with everything else, there is no hard and fast rule and it is perfectly fine if used correctly.