Calculating Circle Area the Monte Carlo Way

Ten years ago, I was in the middle of my MSc degree, titled “Distributed High-Fidelity Graphics Using P2P”. My research revolved around ray tracing – a family of techniques used to produce highly photorealistic images by simulating various aspects of Physics, especially the way light interacts with the various surfaces and materials it comes in contact with.

Ray tracing is very complex and computationally intensive, and it would take an entire book to write anything substantial about it. But, at the heart of modern ray tracing is an approach called Monte Carlo simulation, or stochastic simulation. It’s a fancy way of saying: this problem is too complex and I don’t have an easy way to calculate a solution, so I’ll take a number of random samples and approximate it instead.

In this article, I’ll illustrate the concept by calculating the area of a circle. Since we have an easy formula to calculate the area of a circle (πr2), there’s normally no need to do this via Monte Carlo simulation. However, it’s an easy way to explain the concept, which you can then use to solve more complex problems.

Approximating the Area of a Circle in Theory

Let’s assume that we don’t know how to calculate the area of a circle – either because a formula does not exist, or because it is too complex to derive.

However, we do have a way to determine whether a point lies inside a circle. If the point is represented by the coordinates (x, y), then the point lies inside the circle of radius r if (x2 + y2) <= r2, assuming that the circle’s centre is located at (0, 0).

A visualisation of a Monte Carlo approximation of a circle.

In this case, we can take a number of random points in space and compare how many fall inside and outside of the circle. You can see a simple visualisation of this above. Imagine we were throwing darts at a dartboard while blindfolded. The more darts we throw, the more we are filling in the circle, and the ratio of darts inside vs outside the dartboard tells us the overall area we covered.

It’s a little counterintuitive to imagine how a random process leads towards a deterministic solution. In fact, it’s not really deterministic, and the actual result changes with every run; but the approximation does work very well, and the more samples you take, the closer you get to the real solution. Let’s test this out in practice.

Approximating the Area of a Circle in Python 3.8

The code to implement the Monte Carlo approximation of the area of a circle is quite straightforward, as you can see below.

import random

def get_random_coord(radius):
    return random.uniform(-radius, radius)

def calculate_area_monte_carlo(radius):

    total_in_circle = 0

    for i in range(0, 10000000):
        (x, y) = get_random_coord(radius), get_random_coord(radius)
        in_circle = x ** 2 + y ** 2 <= radius ** 2
        
        if in_circle:
            total_in_circle += 1

        if i in [9999, 99999, 150000, 999999, 1500000, 9999999]:
            approximate_area = (2 * radius) ** 2 * total_in_circle / i
            print(f"Area ({i} samples): {approximate_area}")

Essentially, all I’m doing is:

  • Defining a helper function get_random_coord() to generate a random coordinate, just so that I don’t have to repeat the same code twice.
  • Generating an (x, y) coordinate pair within the square containing the circle.
  • Determining whether this (x, y) falls within the circle, using the aforementioned formula (x2 + y2) <= r2, in which case I increment total_in_circle.
  • At specific (arbitrary) intervals (numbers of samples), I calculate the approximate area of the circle and print it out to show how the calculated area gets more accurate, the more samples you take.

The approximate area is calculated as a ratio of samples in circle (total_in_circle) vs total samples both inside and outside of the circle (i). However, we have to adjust that by the area of a square, which is 2r2, as explained in this Stack Overflow answer.

In reality, we do know a formula to accurately calculate the area of a circle (πr2), so let’s toss that in as well in order to compare how accurate the results are:

import random
import math

def get_random_coord(radius):
    return random.uniform(-radius, radius)

def calculate_area_monte_carlo(radius):

    total_in_circle = 0

    for i in range(0, 10000000):
        (x, y) = get_random_coord(radius), get_random_coord(radius)
        in_circle = x ** 2 + y ** 2 <= radius ** 2
        
        if in_circle:
            total_in_circle += 1

        if i in [9999, 99999, 150000, 999999, 1500000, 9999999]:
            approximate_area = (2 * radius) ** 2 * total_in_circle / i
            print(f"Area ({i} samples): {approximate_area}")

def calculate_area_formula(radius):
    area = math.pi * radius ** 2
    print("Area (reference):", area)

def main():
    radius = 400
    calculate_area_monte_carlo(radius)
    calculate_area_formula(radius)

if __name__ == "__main__":
    main()

We can now run this program, and after a little patience, we get some results:

$ python3 main2.py
Area (9999 samples): 501874.1874187419
Area (99999 samples): 503793.8379383794
Area (150000 samples): 503940.26666666666
Area (999999 samples): 502511.2225112225
Area (1500000 samples): 502637.2266666667
Area (9999999 samples): 502743.410274341
Area (reference): 502654.8245743669

If we run it again, the results are a little different, but the trend is still there:

$ python3 main2.py
Area (9999 samples): 506930.69306930696
Area (99999 samples): 501861.0186101861
Area (150000 samples): 502101.3333333333
Area (999999 samples): 502941.30294130294
Area (1500000 samples): 502856.1066666667
Area (9999999 samples): 502779.44227794424
Area (reference): 502654.8245743669

This is the essence of the Monte Carlo approach: the more random samples you take, the more you’re filling in the problem space. However, this comes at a cost: the more samples you take, the more time it takes to compute. With enough samples, the error eventually becomes small enough that the approximation is indistinguishable from an accurate solution. Let’s visualise this.

Visualising the Area of a Circle Approximation

We can use the PIL library to quickly generate progressive images of the circle coverage. All we need is to add a few lines to our existing code:

import random
import math
from PIL import Image

def get_random_coord(radius):
    return random.uniform(-radius, radius)

def calculate_area_monte_carlo(radius):
    diameter = radius * 2
    image = Image.new("RGB", (diameter, diameter), (255, 255, 255))
    in_colour = (255, 0, 0) # red
    out_colour = (0, 0, 255) # blue

    total_in_circle = 0

    for i in range(0, 10000000):
        (x, y) = get_random_coord(radius), get_random_coord(radius)
        in_circle = x ** 2 + y ** 2 <= radius ** 2
        
        if in_circle:
            total_in_circle += 1
            colour = in_colour
        else:
            colour = out_colour

        image.putpixel((math.floor(x + radius), math.floor(y + radius)), colour)

        if i in [9999, 99999, 150000, 999999, 1500000, 9999999]:
            approximate_area = (2 * radius) ** 2 * total_in_circle / i
            print(f"Area ({i} samples): {approximate_area}")
            image.save(f'circle_{i}.png')

def calculate_area_formula(radius):
    area = math.pi * radius ** 2
    print("Area (reference):", area)

def main():
    radius = 400
    calculate_area_monte_carlo(radius)
    calculate_area_formula(radius)

if __name__ == "__main__":
    main()

What we’re doing here is:

  • Creating an image at the beginning.
  • For each sample, we colour the corresponding pixel either red (if it’s in the circle) or blue (if it’s outside the circle). Note that we have to add the radius to x and y because in the image, (0, 0) is not the centre of the circle, but the upper-left corner of the image.
  • Saving an image whenever we print an approximate area.

Earlier, we saw how the numbers seemed to be converging towards the accurate solution. If we now run the updated code above, we can verify the progressive coverage of the circle from actual images (click to enlarge):

Applications

As I said earlier, the Monte Carlo approach is not necessary for problems where we have a simple formula, like calculating the area of a circle. However, not all problems have a simple formula.

If you’ve studied calculus, you’ll know that you can integrate a function to calculate the area under its graph. You can do that for a simple function like 5x2, but there are more complex functions for which it is impossible to derive an analytical solution. Even if you never studied calculus, you can imagine how hard it must be to calculate the area of an irregular shape, such as your favourite politician. Randomly throwing darts at them and counting how many hit might be one way of working that out.

And if you ever work with ray tracing, you’ll eventually realise that ray tracing is really just another integration problem – one where you have to calculate the area (or contribution of light towards a surface) of something very irregular. A Monte Carlo approximation is a very effective – if computationally expensive – way of solving this otherwise intractable problem.

Filebeat, Elasticsearch and Kibana with Docker Compose

Docker is one of those tools I wish I had learned to use a long time ago. I still remember how painful it always was to set up Elasticsearch on Linux, or to set up both Elasticsearch and Kibana on Windows, and occasionally having to repeat this process occasionally to upgrade or recreate the Elastic stack.

Fortunately, Docker images now exist for all Elastic stack components including Elasticsearch, Kibana and Filebeat, so it’s easy to spin up a container, or to recreate the stack entirely in a matter of seconds.

Getting them to work together, however, is not trivial. Security is enabled by default from Elasticsearch 8.0 onwards, so you’ll need SSL certificates, and the examples you’ll find on the internet using docker-compose from the Elasticsearch 7.x era won’t work. Although the Elasticsearch docs provide an example docker-compose.yml that includes Elasticsearch and Kibana with certificates, this doesn’t include Filebeat.

In this article, I’ll show you how to tweak this docker-compose.yml to run Filebeat alongside Elasticsearch and Kibana.

  • I’ll be doing this with Elastic stack 8.4 on Linux, so if you’re on Windows or Mac, drop the sudo from in front of the commands.
  • You can find the relevant files for this article in the FekDockerCompose folder at the Gigi Labs BitBucket Repository.
  • This is merely a starting point and by no means production-ready.
  • A lot of things can go wrong along the way, so I’ve included a lot of troubleshooting steps.

The Doc Samples

The “Install Elasticsearch with Docker” page at the official Elasticsearch documentation is a great starting point to run Elasticsearch with Docker. The section “Start a multi-node cluster with Docker Compose” provides what you need to run a three-node Elasticsearch cluster with Kibana in Docker using docker-compose.

The first step is to copy the sample .env file and fill in any values you like for the ELASTIC_PASSWORD and KIBANA_PASSWORD settings, such as the following (don’t use these values in production):

# Password for the 'elastic' user (at least 6 characters)
ELASTIC_PASSWORD=elastic

# Password for the 'kibana_system' user (at least 6 characters)
KIBANA_PASSWORD=kibana

# Version of Elastic products
STACK_VERSION=8.4.0

# Set the cluster name
CLUSTER_NAME=docker-cluster

# Set to 'basic' or 'trial' to automatically start the 30-day trial
LICENSE=basic
#LICENSE=trial

# Port to expose Elasticsearch HTTP API to the host
ES_PORT=9200
#ES_PORT=127.0.0.1:9200

# Port to expose Kibana to the host
KIBANA_PORT=5601
#KIBANA_PORT=80

# Increase or decrease based on the available host memory (in bytes)
MEM_LIMIT=1073741824

# Project namespace (defaults to the current folder name if not set)
#COMPOSE_PROJECT_NAME=myproject

Next, copy the sample docker-compose.yml. This is a large file so I won’t include it here, but in case the documentation changes, you can find an exact copy at the time of writing as docker-compose-original.yml in the aforementioned BitBucket repo.

Once you have both the .env and docker-compose.yml files, you can run the following command to spin up a three-node Elasticsearch cluster and Kibana:

sudo docker-compose up

You’ll see a lot of output and, after a while, if you access http://localhost:5601/, you should be able to see the Kibana login screen:

The Kibana login screen.

Troubleshooting tip: Unhealthy containers

It can happen that some of the containers fail to start up and claim to be “unhealthy”, without offering a reason. You can find out more by taking the container ID (provided in the error in the output) and running:

sudo docker logs <containerId>

Chances are that the error you’ll see in the logs will be this:

bootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

This is in fact explained in the same documentation page and elaborated in another one. Run the following command to fix it on Linux, or refer to the documentation for other OSes:

sudo sysctl -w vm.max_map_count=262144

Adding Filebeat to docker-compose.yml

The sample docker-compose.yml consists of five services: setup, es01, es02, es03 and kibana. While the documentation already explains how to Run Filebeat on Docker, what we need here is to run it alongside Elasticsearch and Kibana. The first step to do that is to add a service for it in the docker-compose.yml, after kibana:

  filebeat:
    depends_on:
      es01:
        condition: service_healthy
      es02:
        condition: service_healthy
      es03:
        condition: service_healthy
    image: docker.elastic.co/beats/filebeat:${STACK_VERSION}
    container_name: filebeat
    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
      - ./test.log:/var/log/app_logs/test.log
      - certs:/usr/share/elasticsearch/config/certs
    environment:
      - ELASTICSEARCH_HOSTS=https://es01:9200
      - ELASTICSEARCH_USERNAME=elastic
      - ELASTICSEARCH_PASSWORD=${ELASTIC_PASSWORD}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt

The most interesting part of this is the volumes:

  • filebeat.yml: this is how we’ll soon be passing Filebeat its configuration.
  • test.log: we’re including this example file just to see that Filebeat actually works.
  • certs: this is the same as in all the other services and is part of what allows them to communicate securely using SSL certificates.

Generating a Certificate for Filebeat

The setup service in docker-compose.yml has a script that generates the certificates used by all the Elastic stack services defined there. It creates a file at config/certs/instances.yml specifying what certificates are needed, and passes that to the bin/elasticsearch-certutil command to create them. We can follow the same pattern as the other services in instances.yml to create a certificate for Filebeat:

          "  - name: es03\n"\
          "    dns:\n"\
          "      - es03\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: filebeat\n"\
          "    dns:\n"\
          "      - es03\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          > config/certs/instances.yml;

Configure Filebeat

Create a file called filebeat.yml, and configure the input section as follows:

filebeat.inputs:
- type: filestream
  id: my-application-logs
  enabled: true
  paths:
    - /var/log/app_logs/*.log

Here, we’re using a filestream input to pick up any files ending in .log from the /var/log/app_logs/ folder. This path is arbitrary (as is the id), but it’s important that it corresponds to the location where we’re voluming in the test.log file in docker-compose.yml:

    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
      - ./test.log:/var/log/app_logs/test.log
      - certs:/usr/share/elasticsearch/config/certs

While you’re at it, create the test.log file with any lines of text, such as the following:

Log line 1
Air Malta sucks
Log line 3

Back to filebeat.yml, we also need to configure it to connect to Elasticsearch using not only the Elasticsearch username and password, but also the certificates that we are generating thanks to what we did in the previous section:

output.elasticsearch:
  hosts: '${ELASTICSEARCH_HOSTS:elasticsearch:9200}'
  username: '${ELASTICSEARCH_USERNAME:}'
  password: '${ELASTICSEARCH_PASSWORD:}'
  ssl:
    certificate_authorities: "/usr/share/elasticsearch/config/certs/ca/ca.crt"
    certificate: "/usr/share/elasticsearch/config/certs/filebeat/filebeat.crt"
    key: "/usr/share/elasticsearch/config/certs/filebeat/filebeat.key"

Troubleshooting tip: Peeking inside a container

In case you’re wondering where I got those certificate paths from, I originally looked inside the container to see where the certificates were being generated for the other services. You can get a container ID with docker ps, and then access the container as follows:

sudo docker exec -it <containerId> /bin/bash

More Advanced Filebeat Configurations

Although we’re using simple filestream input in this example to keep things simple, Filebeat can be configured to gather logs from a large variety of data sources, ranging from web servers to cloud providers, thanks to its modules.

A good way to explore the possibilities is to download a copy of Filebeat and sift through all the different YAML configuration files that are provided as reference material.

Running It All

It’s now time to run docker-compose with Filebeat running alongside Kibana and the three-node Elasticsearch cluster:

sudo docker-compose up

Troubleshooting tip: Recreating certificates

The setup script has a check that won’t create certificates again if it has already been run (by looking for the config/certs/certs.zip file). So if you’ve already run docker-compose up before, you’ll need to recreate these certificates in order to get the one for Filebeat. The easiest way to do it is by just clearing out the volumes associated with this docker-compose:

sudo docker-compose down --volumes

Troubleshooting tip: filebeat.yml permissions

It’s also possible to get the following error:

filebeat | Exiting: error loading config file: config file (“filebeat.yml”) can only be writable by the owner but the permissions are “-rw-rw-r–” (to fix the permissions use: ‘chmod go-w /usr/share/filebeat/filebeat.yml’)

The solution is, of course, to heed the error’s advice and run the following command (on your host machine, not in the container):

chmod go-w filebeat.yml

Troubleshooting tip: Checking individual container logs

The logs coming from all the different services can be overwhelming, and the verbose JSON structure doesn’t help. If you suspect there’s a problem with a specific container (e.g. Filebeat), you can see the logs for that specific service as follows:

sudo docker-compose logs -f filebeat

You can of course still use sudo docker logs <containerId> if you want, but this alternative puts the name of the service before each log line, and some terminals colour it. This at least helps to visually distinguish one line from another.

Output of sudo docker-compose logs -f filebeat.

Verifying Log Data in Kibana

You only know Filebeat really worked if you see the data in Kibana. Fire up http://localhost:5601/ in a browser and login using “elastic” as the username, and whatever password you set up in the .env file (in this example it’s also “elastic” for simplicity).

The first test I usually do is to check whether an index has actually been created at all. Because if it hasn’t, you can search all you want in Discover and you’re not going to find anything.

Click the hamburger menu in the top-left, scroll down a bit, and click on “Dev Tools”. There, enter the following query and run it (by clicking the Play button or hitting Ctrl+Enter):

GET _cat/indices

If you see an index whose name contains “filebeat” in the results panel on the right, then that’s encouraging.

GET _cat/indices shows that we have a Filebeat index.

Now that we know that some data exists, click the hamburger menu at the top-left corner again and go to “Discover” (the first item). There, you’ll be prompted to create a “data view” (if you don’t have any data, you’ll be shown a different prompt offering integrations instead). If I understand correctly, this “data view” is what used to be called an “index pattern” before.

At Discover, you’re asked to create a data view.

Click on the “Create data view” button.

Creating the data view, whatever it is.

You can give the data view a name and an index pattern. I suppose the name is arbitrary. For the index pattern, I still use filebeat-* (you’ll see the index name on the right turn bold as you type, indicating that it’s matching), although I’m not sure whether the wildcard actually makes a difference now that the index is some new thing called a data stream.

The timestamp field gets chosen automatically for you, so no need to change anything there. Just click on the “Save data view to Kibana” button. You should now be able to enjoy your lovely data.

Viewing data ingested via Filebeat in the Discover section of Kibana.

Troubleshooting tip: Time range

If you don’t see any data in Discover, it doesn’t necessarily mean something went wrong. The default time range of “last 15 minutes” means you might not see any data if there wasn’t any indexed recently. Simply adjust it to a longer period (e.g. last 2 hours).

Conclusion

The Elastic stack is a wonderful set of tools, but its power comes with a lot of complexity. Docker makes it easier to run the stack, but it’s often difficult to find guidance even on simple scenarios like this. I’m hoping that this article makes things a little easier for other people wanting to run Filebeat alongside Elasticsearch and Kibana in Docker.

GoLang Set Data Structure

One of my complaints about Go in my recent article, “From .NET to GoLang: Where Did Everything Go?“, was the lack of built-in data structures, such as a set.

Many people suggest writing your own, and there are websites that show you how to do this. Of course, you can use a map to represent a set, but you’d also have to implement common operations such as union, intersection and difference.

Fortunately, there’s no need to reinvent the wheel like this, because the community has stepped in to provide a good set implementation. The golang-set package provides all the functionality expected of a set data structure, and its v2 release makes full use of the (recently released) Go generics functionality, allowing you to use it with virtually any data type.

I’m expecting the reader to know what a set is (i.e. an unordered, deduplicated collection) and what the common operations are. I’ll be focusing on how to work with sets in Go using the aforementioned package.

Installing golang-set

First, as usual, set up a Go module:

go mod init main

Then, to install golang-set in your Go project, make sure to get the v2 release, as follows:

go get github.com/deckarep/golang-set/v2

Then in your Go file (e.g. main.go) be sure to import the package as in the following example:

package main

import (
    mapset "github.com/deckarep/golang-set/v2"
)

func main() {
	// test code will Go here
}

Initialising a Set

You create a new Set by calling the NewSet() function. For instance, for a set of strings:

fruit := mapset.NewSet[string]()

Since the Set type supports generics, you can use a different type instead of a string, for instance, we could have a set of integers:

oddNumbers := mapset.NewSet[int]()

While we’ll soon see how to add values to our set, it’s possible to initialise a set with values when you create it. For instance, you can pass integer values directly:

oddNumbers := mapset.NewSet(1, 3, 5, 7, 9)

Or else unpack an array of integer values using the ... spread operator:

	oddNumbersArray := []int{1, 3, 5, 7, 9}
	oddNumbers := mapset.NewSet(oddNumbersArray...)

Note that in these cases, it’s not necessary to specify the generic type parameter when calling NewSet() because Go is smart enough to figure out the type from the values you pass in.

String Representation of a Set

The Set type has a convenient string representation. You can see this by printing the Set directly, e.g.:

fmt.Println(oddNumbers)

…gives…

Set{1, 3, 5, 7, 9}

Alternatively, you can obtain the same string representation programmatically. The following should provide the same output:

	oddNumbersStr := oddNumbers.String()
	fmt.Println(oddNumbersStr)

Actually, it won’t necessarily give exactly the same output. Because the Set is unordered, the order of the values in the string representation may change. For instance, running it again, you might get this:

Set{9, 1, 3, 5, 7}

Converting a Set to a Slice

You can use the ToSlice() function to convert a Set to a slice:

	oddNumbersSlice := oddNumbers.ToSlice()
	fmt.Println(oddNumbersSlice) // prints "[5 7 9 1 3]"

Update 9th August 2022: Like String(), the result of ToSlice() is non-deterministic and can give different results each time it’s run. This is particularly annoying when you want to compare sets in a test (the set data might be coming from an API, so it’s not just a matter of calling Equal()).

Adding and Removing Items

Use the Add() function to add items to a set. For example:

	evenNumbers := mapset.NewSet[int]()
	evenNumbers.Add(2)
	evenNumbers.Add(4)
	evenNumbers.Add(6)
	evenNumbers.Add(8)
	evenNumbers.Add(10)

	fmt.Println(evenNumbers)

It seems like this needs to be called for each item you want to add. It would be nice to have something like C#’s AddRange() method, by which you could append an entire array, list, or other collection of items.

You can remove a specific item from a Set by calling Remove():

	evenNumbers.Remove(10)

	fmt.Println(evenNumbers)

The combined output of these two examples would be something like:

Set{2, 4, 6, 8, 10}
Set{2, 4, 6, 8}

Another way to remove items from a set is to call Pop(), which “removes and returns an arbitrary item from the set.” I suppose this could be useful when you want to process all items in the set until it’s empty.

You can also call Clear() to remove all items from the set.

Counting Items

Use the Cardinality() function to get the number of items in the set. The following example outputs a value of 4.

fmt.Println(evenNumbers.Cardinality())

Set Membership

You can test basic set membership using Contains():

	primeNumbers := mapset.NewSet(2, 3, 5, 7, 11)

	fmt.Println(primeNumbers.Contains(3)) // prints "true"
	fmt.Println(primeNumbers.Contains(4)) // prints "false"

Set Comparisons

The following functions are available to compare sets:

  • Equal() returns true if the two sets have the same elements
  • IsSubset() returns true if the first set is a subset of the second set, or if they’re equal
  • IsProperSubset() returns true if the first set is a subset of the second set and they aren’t equal
  • IsSuperset() returns true if the first set is a superset of the second set (i.e. the second set is a subset of the first), or if they’re equal
  • IsProperSuperset() returns true if the first set is a superset of the second set (i.e. the second set is a subset of the first), and they aren’t equal

The following example shows what to expect from these operations:

	primeNumbers := mapset.NewSet(2, 3, 5, 7, 11)
	primeNumbers2 := mapset.NewSet(2, 3, 5, 7, 11)
	primeNumbersSubset := mapset.NewSet(2, 3, 5)

	// set equality

	fmt.Println(primeNumbers.Equal(primeNumbers2))      // prints "true"
	fmt.Println(primeNumbers.Equal(primeNumbersSubset)) // prints "false"

	// subset

	fmt.Println(primeNumbersSubset.IsSubset(primeNumbers)) // prints "true"
	fmt.Println(primeNumbers.IsSubset(primeNumbersSubset)) // prints "false"
	fmt.Println(primeNumbers2.IsSubset(primeNumbers))      // prints "true"

	// proper subset

	fmt.Println(primeNumbersSubset.IsProperSubset(primeNumbers)) // prints "true"
	fmt.Println(primeNumbers.IsProperSubset(primeNumbersSubset)) // prints "false"
	fmt.Println(primeNumbers2.IsProperSubset(primeNumbers))      // prints "false"

	// superset

	fmt.Println(primeNumbersSubset.IsSuperset(primeNumbers)) // prints "false"
	fmt.Println(primeNumbers.IsSuperset(primeNumbersSubset)) // prints "true"
	fmt.Println(primeNumbers2.IsSuperset(primeNumbers))      // prints "true"

	// proper superset

	fmt.Println(primeNumbers.IsProperSuperset(primeNumbersSubset)) // prints "true"
	fmt.Println(primeNumbersSubset.IsProperSuperset(primeNumbers)) // prints "false"
	fmt.Println(primeNumbers.IsProperSuperset(primeNumbers2))      // prints "false"

Set Operations

Naturally, the golang-set package provides the typical set operations you would expect, including:

  • Union() – combines all elements from both sets, eliminating duplicates
  • Intersection() – obtains only those elements that exist in both sets
  • Difference() – gets those elements in the first set that aren’t in the second set
  • SymmetricDifference() – the union minus the intersection

Here are a few examples showing these operations in action:

	fibonacciNumbers := mapset.NewSet(0, 1, 2, 3, 5)
	triangularNumbers := mapset.NewSet(1, 3, 6, 10, 15)

	fmt.Println(fibonacciNumbers.Union(triangularNumbers))     // prints "Set{0, 1, 2, 3, 5, 6, 10, 15}"
	fmt.Println(fibonacciNumbers.Intersect(triangularNumbers)) // prints "Set{1, 3}"

	fmt.Println(fibonacciNumbers.Difference(triangularNumbers)) // prints "Set{0, 2, 5}"
	fmt.Println(triangularNumbers.Difference(fibonacciNumbers)) // prints "Set{6, 10, 15}"

	fmt.Println(fibonacciNumbers.SymmetricDifference(triangularNumbers)) // prints "Set{5, 6, 10, 15, 0, 2}"

JSON Functions

golang-set seems to have functions to serialise and deserialise a set to/from JSON. I’m not sure where these would be useful, but I decided to give them a try

Use MarshalJSON() to serialise a set to JSON, which ends up looking just like the slice representation:

	evenNumbers := mapset.NewSet(2, 4, 6, 8, 10)

	jsonBytes, err := evenNumbers.MarshalJSON()
	if err == nil {
		fmt.Println(string(jsonBytes)) // prints "[2,4,6,8,10]"
	}

UnmarshalJSON() is supposed to deserialise JSON back to a set, but it doesn’t seem to work:

	evenNumbers2 := mapset.NewSet[int]()
	err = evenNumbers2.UnmarshalJSON(jsonBytes)
	fmt.Println(evenNumbers2) // prints "Set{}"

I have no idea what’s the problem with this. The JSON functions are neither documented in the readme nor covered by tests, but they were easy enough to discover via Intellisense in Visual Studio Code.

Conclusion

Hopefully this tour of the golang-set package has shown you enough existing Set functionality that you won’t have to write your own set data structure in Go ever again.

From .NET to GoLang: Where Did Everything Go?

Today marks six months since I started working with Go (also known as GoLang). Before that, I worked for about a decade using C#, with which I became quite comfortable over the years. It’s been fun to learn a new programming language professionally, but it does take some adjustment. After six months, I don’t expect to be an expert, or even know the language well, but I’d like to share the candid experience of a newcomer to programming in Go.

The general feeling I have about Go is that it is somewhat tedious to work with (see also my Twitter thread from 3 months ago). This is down to a lack of (a) language features that make development more productive, and (b) standard library functionality that provides common things that everybody uses. Perhaps some of this might be down to my own inexperience, and I welcome feedback as long as it’s constructive. However, my understanding (e.g. based on common Stack Overflow answers) is due to Go’s nature as a “simple language”.

I don’t buy the “simple language” argument (we’ll see more of this later). Why would Google bother to create a new language that offered less features than existing languages? Thinking about it, it’s probably down to historical reasons. Go seems to have appeared late in 2009. At the time, there weren’t a lot of options in terms of robust, productive, general-purpose programming languages that were both open-source and cross-platform. Node.js was still in its infancy; Rust would only be released the following year; and .NET Core was still several years away, which means C# was still restricted to Microsoft platforms. Python had been around much longer, but it has some very well-known limitations (e.g. in terms of performance).

In the rest of this article, we’ll take a tour of some of the things that stood out as I’ve been learning and using Go. I’m just hoping this will help illustrate why I think Go is tedious, and perhaps help other people thinking about picking up the language.

Update 16th November 2022: see also the followup article: From .NET to GoLang: Here We Go Again.

OOP and Classes

Let’s get this out of the way first: Go is a procedural language, not unlike C or Pascal. You write a bunch of statements, control the flow with loops and conditional statements, organise them into functions, and that’s about it. There are no classes, objects, methods and all that (although there is some concept of interfaces, and receiver functions seem very much like extension methods).

To be honest, this is the one thing that doesn’t bother me at all. I’ve seen countless developers overcomplicate life unnecessarily with OOP (e.g. layers upon layers of inheritance) in C#. I’m not all-out against OOP as Zed Shaw is (see “Object Oriented Programming in Python“). However, the vast of majority of work I’ve done with C# was working with data (whether that’s building an API, working with a cache, using queues etc, it’s almost always a matter of getting data, transforming it, and passing it somewhere else) which doesn’t seem to need abstraction, so a procedural approach fits. OOP is better suited for modelling more complex things like GUI elements, games, etc.

It’s interesting to note that while Go doesn’t have OOP, this didn’t quite spare it from the horror of ORMs (e.g. GORM). (See ADO .NET Part 1: Introduction for why I’m not a fan of ORMs.)

Generics

One of the things developers missed most since Go’s inception were generics. While Go does have some generic data structures (e.g. the map), it didn’t allow developers to create their own generic data structures until support for generics was added to the language in Go 1.18 in March this year. This means that, for instance, if you wanted to create your own stack data structure, you’d have to create one stack for integers, another for strings, etc.

Even now that generics are available, the fact that they haven’t been around long means there are limitations. A couple I’ve run into include:

If you’re familiar with the history of .NET, you might recognise that even C# initially shipped without generics in 2001, and they only made it into the language in C# 2.0 (2005). However, it’s taken Go 12 years to get generics, and we’re now in 2022.

Standard Library

Sets

One thing I use a lot is a set data structure. In C#, Python or JavaScript, you get this out of the box. But Go doesn’t have it. Why not?

Well, someone asked this on Stack Overflow back in 2015. As is typical for Stack Overflow (see “On Stack Overflow“), the question got closed. The top answers are variants of “it doesn’t have a set data structure because you can write it yourself”.

This attitude infuriates me. Software development is complex enough, and I’d like to focus on whatever problem I need to solve, instead of reinventing the wheel and going on a yak shaving spree every time I need some common dependency that the standard library doesn’t provide.

Besides, it’s not as simple as writing a set data structure. You also need to write implementations for the operations you need (e.g. intersection, union, difference, symmetric difference, etc), test them thoroughly, make sure they’re efficient (from a performance perspective), etc. This is something that takes time to get right, but at the same time it’s also something basic that’s already been solved to death, and there’s no reason why every Go developer should have to reimplement it, when other languages provide battle-tested implementations out of the box.

In fact, there’s a comment on one of the answers that echoes my frustration:

“The usual answer for golang question: “Why provide a feature when you can rewrite it in just a few lines?”. This is why something that can be done in 3 explicit lines in python (or many other languages) takes 50+ obscure lines in go. This is one of the reasons (along with single letter variables) why I hate reading go code. It’s uselessly long, just doing with for loops what should be done by a clear, efficient and well tested properly named function. Go “spirit” is just throwing away 50 years of good software engineering practice with dubious justifications.”

Colin Pitrat, Jul 8, 2021 at 11:34

Go doesn’t come with much other than arrays, slices and maps. However, in 2017, with the release of Go 1.9, it did get sync.Map, which I understand is similar to the ConcurrentDictionary in .NET. For anything else, you’ll likely have to find an implementation on GitHub or write it yourself.

LINQ

I already said Go is a procedural language. You’ll feel it a lot. For everything you need to do, you’ll have to write lots and lots of loops, making the code a lot more verbose and error-prone compared to other languages where you can use a more functional approach (e.g. C# LINQ Select(), map() in JavaScript or Python, or list comprehensions in Python).

Mathematical Functions

If I want to find the smallest number in an array in C#, I just call Math.Min().

Does Go have a built-in function to get the smallest number in an array? No, you have to write it yourself. Here we go again.

Update 30th October 2023: min() and max() built-in functions were finally added recently in Go 1.21.

Exception Handling

I never really liked exception handling in OOP languages. I felt that checking return values of functions as in C was a lot more clear that what seemed to be a wrapper for a goto-like construct where code could suddenly jump elsewhere unpredictably on a whim.

Go doesn’t have exception handling, and so most logic looks something like this:

func doSomething() error {
	foo, err := doSomethingElse(1)
	if err != nil {
		logrus.Error("Step 1 failed", err)
		return err
	}

	bar, err := doSomethingElse(5)
	if err != nil {
		logrus.Error("Step 2 failed", err)
		return err
	}

	chicken, err := doSomethingElse(10)
	if err != nil {
		logrus.Error("Step 3 failed", err)
		return err
	}

	// ...

	return nil
}

I changed my mind. I want exception handling back.

Unused Variables

The code in the previous section doesn’t actually compile. Why not?

$ go run main.go
# command-line-arguments
./main.go:14:2: foo declared but not used
./main.go:20:2: bar declared but not used
./main.go:26:2: chicken declared but not used

Go actually fails to build if you have unused variables.

While I totally understand the benefit of keeping code clean, this is simply extreme, and very irritating. It’s very common for me to need to add a temporary variable to capture the output of some computation (or an HTTP request) and see what’s in the data, but in Go, I have to resort to a redundant fmt.Println() just to mark the variable as in-use and keep the compiler happy. It’s much more suitable to issue a warning than to fail the build.

Syntax

Function Overloading

Go doesn’t have function overloading, so you can’t have different functions with the same name and different parameters. You’ll instead have to come up with silly variants of functions that do the same thing, e.g. doSomething() and doSomething2(). Feels like going back in time, doesn’t it?

if Statements, Braces and Semicolons

if statements in Go don’t need brackets around the condition, but do mandate braces around the statements, even if there is only one statement:

	if chicken < 5 {
		logrus.Info("Chicken is less than 5")
	}

I’ve already written at length in “To Always Use Braces for Conditionals and Loops… or not” why I’m in favour of omitting braces for single-line statements. And since that’s not possible in Go, it only serves to add further verbosity to the language.

Braces must also be “Egyptian-style” (as shown above). The reason for this is explained in the FAQ and is down to semicolon insertion, just like JavaScript (which is very sad):

Why are there braces but no semicolons? And why can’t I put the opening brace on the next line?

“Go uses brace brackets for statement grouping, a syntax familiar to programmers who have worked with any language in the C family. Semicolons, however, are for parsers, not for people, and we wanted to eliminate them as much as possible. To achieve this goal, Go borrows a trick from BCPL: the semicolons that separate statements are in the formal grammar but are injected automatically, without lookahead, by the lexer at the end of any line that could be the end of a statement. This works very well in practice but has the effect that it forces a brace style. For instance, the opening brace of a function cannot appear on a line by itself.

“Some have argued that the lexer should do lookahead to permit the brace to live on the next line. We disagree. Since Go code is meant to be formatted automatically by gofmt, some style must be chosen. That style may differ from what you’ve used in C or Java, but Go is a different language and gofmt’s style is as good as any other. More important—much more important—the advantages of a single, programmatically mandated format for all Go programs greatly outweigh any perceived disadvantages of the particular style. Note too that Go’s style means that an interactive implementation of Go can use the standard syntax one line at a time without special rules.”

Go FAQ

Ternary Operator

Go doesn’t have the ?: ternary operator, often used in other languages as a concise replacement for an if statement. Why not? Once again, the question that asks this on Stack Overflow has been closed, but the answer quotes the Go FAQ to shed some light. The reason is a variant of “some developers have made messes with the ternary operator, and that’s why you can’t have nice things”. Come. On.

Loops

When it comes to loops, Go has just the for loop, so it doesn’t have the usual while and do..while loops you’d normally find in C-style programming languages. This doesn’t bother me, as I almost always use just for loops anyway.

Go’s for loop does, however, support a foreach-style way of iterating over objects in an array. Let’s try a simple iteration over an array of odd numbers:

	odds := []int{1, 3, 5, 7, 9}
	for n := range odds {
		fmt.Println(n)
	}

There are a few things that bother me here.

  1. The syntax of the array declaration. We’ll get to this later.
  2. The for n := range odds part looks like it’s assigning an entire range to the variable, whereas what it’s really doing is something like foreach (n in odds) in C#.
  3. It doesn’t print what you think it does! The first variable from a range assignment is the index, not the element, so the above code gives the output:
0
1
2
3
4

In order to print the elements, you have to introduce a second variable:

	odds := []int{1, 3, 5, 7, 9}
	for _, n := range odds {
		fmt.Println(n)
	}

Since the point of a foreach is typically to work with an element in a collection, I much prefer C#’s orthogonal way of giving the element by default and having the index as optional.

Variable Declarations

Go’s syntax seems to be loosely based on C-style languages. It uses braces and a lot of syntax and operators are familiar, but it does make some very strange deviations. We’ve already mentioned earlier the lack of semicolons, but there are a couple of other differences that make the language more reminiscent of Pascal than anything else.

The first of these is the fact that type declarations go after the variable name in a variable declaration, e.g.:

var age int

This is strange both because of the redundant var keyword and because it gets very confusing when you switch between Go and another language.

As with many languages today (including older ones like C++), Go can infer the type if you initialise it to a value, e.g.:

age := 5

This := syntax is the other thing that reminds me of Pascal. I don’t really get why it’s beneficial (I assume the reason is mostly academic), but on the other hand I have found it very annoying, as I often have to change between := and = while I’m moving code around. It’s also quite tricky given the fact that many functions return multiple values, and you’ll typically assign the results to a combination of new and reused variables.

Where variable declarations get really confusing is when the data types are data structures. We’ve already seen the initialisation of an array… where the square brackets come before the type:

odds := []int{1, 3, 5, 7, 9}

What about a map? This is a map of string to int:

mapping := map[string]int{}

It’s really, really strange that the type of the value is not delimited by any operator. So this gets weird when the value can be of a complex type itself. For instance, how would you make a map of string to slice of string? I suppose it would be something like this:

mapping := map[string][]string{}

What about a map of string to another map of string to string? I’m guessing that would be:

mapping := map[string]map[string]string{}

This is really weird. I think C#’s syntax is much more readable, even for complex generic data structures.

Race Conditions

Most of the things I’ve talked about are things I find annoying in Go, but I want to wrap up with one feature I find really great.

Go has a way to automatically detect race conditions when you run a program with the -race parameter. This is particularly nice because multithreaded programming is very tricky to get right precisely because of race conditions. Go does provide goroutines and channels as an alternative way of inter-thread communication, but they don’t fit every situation. And since Go does provide basic locking mechanisms but not much in the way of concurrent collections (as opposed to C#), synchronising access to critical sections of code is often necessary. When that happens, having -race handy is a nice feature.

Conclusion

Coming from C#, learning my way around Go has been a fun diversion, but also something of a disappointment. Given the simplistic, verbose and sometimes confusing language, and the limited standard library, there’s no reason I can think of to choose Go for a new project over something richer and more mature like C#, which nowadays runs on any platform.

Getting Started with Cartography for AWS

I have recently been working with Cartography. This tool is great for taking stock of your infrastructural and security assets, visualising them, and running security audits. However, getting it to work the first time is more painful than it needs to be. Through this article, I hope to make it less painful for other people checking out Cartography for the first time.

What is Cartography?

Cartography is a tool that can explore cloud and Software as a Service (SaaS) providers (such as AWS, Azure, GCP, GitHub, Okta and others), gather metadata about them, and store it in a Neo4j graph database. Once in Neo4j, the data can be queried using the Cypher language and the results can be visualised. This is extremely useful to understand the relationship between different infrastructural and security assets, which can sometimes reveal security flaws that need to be addressed.

Cartography is written in Python and maintained by Lyft. Sacha Faust’s “Automating Security Visibility and Democratization” 30-minute talk at BSidesSF 2019 serves as a great intro to Cartography, and also illustrates several of the early data relationships it collected.

Good to Know

Before we dive into setting up Cartography and its dependencies, I want to point out some issues I ran into, in order to minimise frustration.

[Update 8th July 2023: all issues in this section have by now been fixed, so you can skip this section. You can use a newer version of Neo4j now, although the rest of the article still uses Neo4j 3.5 for historical reasons.]

The biggest of these is that Cartography still requires the outdated Neo4j 3.5, which was planned to reach its end-of-life on 28th November 2021. Although a pull request for migration to Neo4j 4.4 was contributed on 30th January 2021, the Lyft team completely missed this deadline. Fortunately, support for Neo4j 3.5 was extended to 27th May 2022. Although the maintainers are planning to migrate to migrate to a newer Neo4j version by then, I’m not holding my breath.

This worries me for a number of reasons:

  1. If Neo4j 3.5 reaches end of life before Cartography have migrated to a more recent version, it means people using Cartography would need to run an unsupported version of Neo4j. This could be a security risk, which is ironic given that Cartography is a tool used for security.
  2. It gives the feeling that Cartography is not very well-maintained, if issues as important as this take well over a year to resolve.
  3. It makes it virtually impossible to run Cartography on a Mac with one of the newer Apple M1 CPUs. That’s because Neo4j 3.5 won’t run on an arm64 processor (e.g. Neo4j Docker images for this architecture started to appear only since 4.4), but also because a Python cryptography dependency needs to be upgraded.

So if you feel you need to depend on Cartography, it might make sense to fork it and maintain it yourself. Upgrading it to support Neo4j 4.4 is tedious but not extremely complicated, and mostly is a matter of updating Cypher queries to use the new parameter syntax as explained in the aforementioned pull request.

Another problem I ran into (and reported) is that Cartography gets much more EBS snapshot data than necessary. This bloats the Neo4j database with orders of magnitude of unnecessary data, and makes the already slow process of data collection take several minutes longer than it needs to.

Setting Up Neo4j

For now, we’ll have to stick with Neo4j 3.5. You can follow the Cartography Installation documentation to set up a local Neo4j instance, but it’s actually much easier to just run a Docker container. In fact, all you need is to run the following command:

sudo docker run -p 7474:7474 -p 7473:7473 -p 7687:7687 neo4j:3.5

Like this, you can avoid bloating your system with dependencies like Java, and just manage the container instead. Depending on the operating system, you use, you may need to keep or drop the sudo command. You’ll also need to mount a volume (not shown here) if you want the data to survive container restarts.

Running a Neo4j 3.5 Docker container.

Once Neo4j 3.5 is running, you can access the Neo4j Browser at localhost:7474:

The Neo4j Browser’s login screen.

Login with the default credentials, i.e. with “neo4j” as both username and password. You will then be prompted to change your password:

Changing password in the Neo4j Browser.

Go ahead and change the password. This is necessary because Cartography would not otherwise be able to connect to Neo4j using the default credentials.

The Neo4j Browser’s dashboard after changing password.

Setting Up a SecurityAudit User in AWS

Cartography can be used to map out several different services, but here we’ll use AWS. To retrieve AWS data, we’ll need to set up a user with a SecurityAudit policy.

Log into the AWS Console, then go into the IAM service, and finally select “Users” on the left. Click the “Add users” button on the right.

Once in IAM, select “Users” on the left, and then click “Add users” on the right.

In the next screen, enter a name for the user, and choose “Access key – Programmatic access” as the AWS credential type, then click the “Next: Permissions” button at the bottom-right.

Enter a username, then choose Programmatic access before proceeding.

In the Permissions screen, select “Attach existing policies directly” (an arguable practice, but for now it will suffice). Use the search input to quickly filter the list of policies until you can see “SecurityAudit”, then click the checkbox next to it, and finally click the “Next: Tags” button at the bottom-right to proceed.

Attach the “SecurityAudit” policy directly to the new user.

There is nothing more to do, so just click on the remaining “Next” buttons and create the user. At this point you are given the new user’s Access key ID and Secret access key. Grab hold of them and keep them in a safe place. We’ll use them shortly.

Now that we have a user with the right permissions, all we need to do us set up the necessary AWS configuration locally, so that Cartography can use that user to inspect the AWS account. This is quite simple and is covered in the AWS Configuration and credential file settings documentation.

First, create a file at ~/.aws/credentials, and then add the Access key ID and Secret access key you just obtained, as follows (replacing the placeholder values):

[default]
aws_access_key_id=ACCESSKEYIDVALUE
aws_secret_access_key=SECRETACCESSKEYIDVALUE

Then, create another file at ~/.aws/config, and add the basic configuration as follows. I’m not sure whether the region actually makes a difference, since Cartography will in fact inspect all regions for many services that can be deployed in multiple regions.

[default]
region=us-west-2
output=json

That’s it! Let’s run Cartography.

Running Cartography

Run the following command to install Cartography:

pip3 install cartography

Then, run Cartography itself:

cartography --neo4j-uri bolt://localhost:7687 --neo4j-password-prompt --neo4j-user neo4j

Enter the Neo4j password you set earlier (i.e. not the default one) when prompted.

Cartography should now run, collecting data from AWS, adding it to Neo4j, and writing output as it works. It takes a while, even for a brand new AWS account.

Querying the Graph

Once Cartography finishes running, go back to the Neo4j Browser at http://localhost:7474/browser/ . You can now write Cypher queries to analyse the data collected by Cartography.

If you haven’t used Cypher before, check out my articles “First Steps with RedisGraph” and “Family Tree with RedisGraph“, as well as my RedisConf 2020 talk “A Practical Introduction to RedisGraph“. RedisGraph is another graph database that uses the same Cypher query language, and these resources should allow you to ramp up quickly.

You might not know what Cartography data to look for initially, but you can always start with a simple MATCH query, and as you type “AWS” as a node type in a partial query (e.g. “MATCH (x:AWS“), Neo4j will suggest types from the ones it knows about. You can also consult the AWS Schema documentation, as well as the aforementioned “Automating Security Visibility and Democratization” talk which illustrates some of these types and their relationships in handy diagrams.

Let’s take a look at a few simple examples around IAM to ease you in.

Example 1: Get All Principals

MATCH (u:AWSPrincipal)
RETURN u

In AWS, a “principal” is an umbrella term for anything that can make a request, including users, groups, roles, and the special root user. Although this is a very basic query, you’ll be surprised by what it returns, including some special internal AWS roles.

Example 2: Get Users with Policies

MATCH (u:AWSUser)-[:POLICY]->(p:AWSPolicy)
RETURN u, p

This query gets users and their policies via the POLICY relationship. Due to the nature of the query, it won’t return users that don’t have any directly attached policies. In this case all I’ve got is the cartography user I created earlier, but you can see the connection to the SecurityAudit policy.

The cartography user is linked to the SecurityAudit policy.

Example 3: Get Policy Statements for Principals

MATCH (a:AWSPrincipal)-->(p:AWSPolicy)-[:STATEMENT]->(s)
RETURN a, p, s

Cartography parses the statements in AWS policies, so if you inspect a node of type AWSPolicy, you can actually see what resources it provides access to. This query shows the relationship between principals (again, this means users, groups, etc) and the details of the policies attached directly to them.

It is possible to refine this query further to include indirectly assigned policies (e.g. to see what permissions a user has via a group it belongs to), or to look for specific permissions (e.g. whether a principal has access to iam:*).

Results of a Cypher query linking AWS principals to the policy statements that apply to them, via AWS policies.

Wrapping Up

As you can see, Cartography takes a bit of effort to set up and has some caveats, but it’s otherwise a fantastic tool to gather data about your resources into Neo4j for further analysis.

"You don't learn to walk by following rules. You learn by doing, and by falling over." — Richard Branson