Tag Archives: Elasticsearch

Filebeat, Elasticsearch and Kibana with Docker Compose

Docker is one of those tools I wish I had learned to use a long time ago. I still remember how painful it always was to set up Elasticsearch on Linux, or to set up both Elasticsearch and Kibana on Windows, and occasionally having to repeat this process occasionally to upgrade or recreate the Elastic stack.

Fortunately, Docker images now exist for all Elastic stack components including Elasticsearch, Kibana and Filebeat, so it’s easy to spin up a container, or to recreate the stack entirely in a matter of seconds.

Getting them to work together, however, is not trivial. Security is enabled by default from Elasticsearch 8.0 onwards, so you’ll need SSL certificates, and the examples you’ll find on the internet using docker-compose from the Elasticsearch 7.x era won’t work. Although the Elasticsearch docs provide an example docker-compose.yml that includes Elasticsearch and Kibana with certificates, this doesn’t include Filebeat.

In this article, I’ll show you how to tweak this docker-compose.yml to run Filebeat alongside Elasticsearch and Kibana.

  • I’ll be doing this with Elastic stack 8.4 on Linux, so if you’re on Windows or Mac, drop the sudo from in front of the commands.
  • You can find the relevant files for this article in the FekDockerCompose folder at the Gigi Labs BitBucket Repository.
  • This is merely a starting point and by no means production-ready.
  • A lot of things can go wrong along the way, so I’ve included a lot of troubleshooting steps.

The Doc Samples

The “Install Elasticsearch with Docker” page at the official Elasticsearch documentation is a great starting point to run Elasticsearch with Docker. The section “Start a multi-node cluster with Docker Compose” provides what you need to run a three-node Elasticsearch cluster with Kibana in Docker using docker-compose.

The first step is to copy the sample .env file and fill in any values you like for the ELASTIC_PASSWORD and KIBANA_PASSWORD settings, such as the following (don’t use these values in production):

# Password for the 'elastic' user (at least 6 characters)
ELASTIC_PASSWORD=elastic

# Password for the 'kibana_system' user (at least 6 characters)
KIBANA_PASSWORD=kibana

# Version of Elastic products
STACK_VERSION=8.4.0

# Set the cluster name
CLUSTER_NAME=docker-cluster

# Set to 'basic' or 'trial' to automatically start the 30-day trial
LICENSE=basic
#LICENSE=trial

# Port to expose Elasticsearch HTTP API to the host
ES_PORT=9200
#ES_PORT=127.0.0.1:9200

# Port to expose Kibana to the host
KIBANA_PORT=5601
#KIBANA_PORT=80

# Increase or decrease based on the available host memory (in bytes)
MEM_LIMIT=1073741824

# Project namespace (defaults to the current folder name if not set)
#COMPOSE_PROJECT_NAME=myproject

Next, copy the sample docker-compose.yml. This is a large file so I won’t include it here, but in case the documentation changes, you can find an exact copy at the time of writing as docker-compose-original.yml in the aforementioned BitBucket repo.

Once you have both the .env and docker-compose.yml files, you can run the following command to spin up a three-node Elasticsearch cluster and Kibana:

sudo docker-compose up

You’ll see a lot of output and, after a while, if you access http://localhost:5601/, you should be able to see the Kibana login screen:

The Kibana login screen.

Troubleshooting tip: Unhealthy containers

It can happen that some of the containers fail to start up and claim to be “unhealthy”, without offering a reason. You can find out more by taking the container ID (provided in the error in the output) and running:

sudo docker logs <containerId>

Chances are that the error you’ll see in the logs will be this:

bootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

This is in fact explained in the same documentation page and elaborated in another one. Run the following command to fix it on Linux, or refer to the documentation for other OSes:

sudo sysctl -w vm.max_map_count=262144

Adding Filebeat to docker-compose.yml

The sample docker-compose.yml consists of five services: setup, es01, es02, es03 and kibana. While the documentation already explains how to Run Filebeat on Docker, what we need here is to run it alongside Elasticsearch and Kibana. The first step to do that is to add a service for it in the docker-compose.yml, after kibana:

  filebeat:
    depends_on:
      es01:
        condition: service_healthy
      es02:
        condition: service_healthy
      es03:
        condition: service_healthy
    image: docker.elastic.co/beats/filebeat:${STACK_VERSION}
    container_name: filebeat
    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
      - ./test.log:/var/log/app_logs/test.log
      - certs:/usr/share/elasticsearch/config/certs
    environment:
      - ELASTICSEARCH_HOSTS=https://es01:9200
      - ELASTICSEARCH_USERNAME=elastic
      - ELASTICSEARCH_PASSWORD=${ELASTIC_PASSWORD}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt

The most interesting part of this is the volumes:

  • filebeat.yml: this is how we’ll soon be passing Filebeat its configuration.
  • test.log: we’re including this example file just to see that Filebeat actually works.
  • certs: this is the same as in all the other services and is part of what allows them to communicate securely using SSL certificates.

Generating a Certificate for Filebeat

The setup service in docker-compose.yml has a script that generates the certificates used by all the Elastic stack services defined there. It creates a file at config/certs/instances.yml specifying what certificates are needed, and passes that to the bin/elasticsearch-certutil command to create them. We can follow the same pattern as the other services in instances.yml to create a certificate for Filebeat:

          "  - name: es03\n"\
          "    dns:\n"\
          "      - es03\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          "  - name: filebeat\n"\
          "    dns:\n"\
          "      - es03\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          > config/certs/instances.yml;

Configure Filebeat

Create a file called filebeat.yml, and configure the input section as follows:

filebeat.inputs:
- type: filestream
  id: my-application-logs
  enabled: true
  paths:
    - /var/log/app_logs/*.log

Here, we’re using a filestream input to pick up any files ending in .log from the /var/log/app_logs/ folder. This path is arbitrary (as is the id), but it’s important that it corresponds to the location where we’re voluming in the test.log file in docker-compose.yml:

    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
      - ./test.log:/var/log/app_logs/test.log
      - certs:/usr/share/elasticsearch/config/certs

While you’re at it, create the test.log file with any lines of text, such as the following:

Log line 1
Air Malta sucks
Log line 3

Back to filebeat.yml, we also need to configure it to connect to Elasticsearch using not only the Elasticsearch username and password, but also the certificates that we are generating thanks to what we did in the previous section:

output.elasticsearch:
  hosts: '${ELASTICSEARCH_HOSTS:elasticsearch:9200}'
  username: '${ELASTICSEARCH_USERNAME:}'
  password: '${ELASTICSEARCH_PASSWORD:}'
  ssl:
    certificate_authorities: "/usr/share/elasticsearch/config/certs/ca/ca.crt"
    certificate: "/usr/share/elasticsearch/config/certs/filebeat/filebeat.crt"
    key: "/usr/share/elasticsearch/config/certs/filebeat/filebeat.key"

Troubleshooting tip: Peeking inside a container

In case you’re wondering where I got those certificate paths from, I originally looked inside the container to see where the certificates were being generated for the other services. You can get a container ID with docker ps, and then access the container as follows:

sudo docker exec -it <containerId> /bin/bash

More Advanced Filebeat Configurations

Although we’re using simple filestream input in this example to keep things simple, Filebeat can be configured to gather logs from a large variety of data sources, ranging from web servers to cloud providers, thanks to its modules.

A good way to explore the possibilities is to download a copy of Filebeat and sift through all the different YAML configuration files that are provided as reference material.

Running It All

It’s now time to run docker-compose with Filebeat running alongside Kibana and the three-node Elasticsearch cluster:

sudo docker-compose up

Troubleshooting tip: Recreating certificates

The setup script has a check that won’t create certificates again if it has already been run (by looking for the config/certs/certs.zip file). So if you’ve already run docker-compose up before, you’ll need to recreate these certificates in order to get the one for Filebeat. The easiest way to do it is by just clearing out the volumes associated with this docker-compose:

sudo docker-compose down --volumes

Troubleshooting tip: filebeat.yml permissions

It’s also possible to get the following error:

filebeat | Exiting: error loading config file: config file (“filebeat.yml”) can only be writable by the owner but the permissions are “-rw-rw-r–” (to fix the permissions use: ‘chmod go-w /usr/share/filebeat/filebeat.yml’)

The solution is, of course, to heed the error’s advice and run the following command (on your host machine, not in the container):

chmod go-w filebeat.yml

Troubleshooting tip: Checking individual container logs

The logs coming from all the different services can be overwhelming, and the verbose JSON structure doesn’t help. If you suspect there’s a problem with a specific container (e.g. Filebeat), you can see the logs for that specific service as follows:

sudo docker-compose logs -f filebeat

You can of course still use sudo docker logs <containerId> if you want, but this alternative puts the name of the service before each log line, and some terminals colour it. This at least helps to visually distinguish one line from another.

Output of sudo docker-compose logs -f filebeat.

Verifying Log Data in Kibana

You only know Filebeat really worked if you see the data in Kibana. Fire up http://localhost:5601/ in a browser and login using “elastic” as the username, and whatever password you set up in the .env file (in this example it’s also “elastic” for simplicity).

The first test I usually do is to check whether an index has actually been created at all. Because if it hasn’t, you can search all you want in Discover and you’re not going to find anything.

Click the hamburger menu in the top-left, scroll down a bit, and click on “Dev Tools”. There, enter the following query and run it (by clicking the Play button or hitting Ctrl+Enter):

GET _cat/indices

If you see an index whose name contains “filebeat” in the results panel on the right, then that’s encouraging.

GET _cat/indices shows that we have a Filebeat index.

Now that we know that some data exists, click the hamburger menu at the top-left corner again and go to “Discover” (the first item). There, you’ll be prompted to create a “data view” (if you don’t have any data, you’ll be shown a different prompt offering integrations instead). If I understand correctly, this “data view” is what used to be called an “index pattern” before.

At Discover, you’re asked to create a data view.

Click on the “Create data view” button.

Creating the data view, whatever it is.

You can give the data view a name and an index pattern. I suppose the name is arbitrary. For the index pattern, I still use filebeat-* (you’ll see the index name on the right turn bold as you type, indicating that it’s matching), although I’m not sure whether the wildcard actually makes a difference now that the index is some new thing called a data stream.

The timestamp field gets chosen automatically for you, so no need to change anything there. Just click on the “Save data view to Kibana” button. You should now be able to enjoy your lovely data.

Viewing data ingested via Filebeat in the Discover section of Kibana.

Troubleshooting tip: Time range

If you don’t see any data in Discover, it doesn’t necessarily mean something went wrong. The default time range of “last 15 minutes” means you might not see any data if there wasn’t any indexed recently. Simply adjust it to a longer period (e.g. last 2 hours).

Conclusion

The Elastic stack is a wonderful set of tools, but its power comes with a lot of complexity. Docker makes it easier to run the stack, but it’s often difficult to find guidance even on simple scenarios like this. I’m hoping that this article makes things a little easier for other people wanting to run Filebeat alongside Elasticsearch and Kibana in Docker.

Bundled JDK in Elasticsearch 7

As a Java application, setting up Elasticsearch has always required having Java set up and the JAVA_HOME environment variable pointing to it. See, for instance, my articles on setting up Elasticsearch on Windows and setting up Elasticsearch on Linux.

From version 7, Elasticsearch is making things a lot easier by bundling a version of OpenJDK with Elasticsearch itself.

“One of the more prominent “getting started hurdles” we’ve seen users run into has been not knowing that Elasticsearch is a Java application and that they need to install one of the supported JDKs first. With 7.0, we’re now releasing versions of Elasticsearch which pre-bundle the JDK to help users get started with Elasticsearch even faster. If you want to bring your own JDK, you can still do so by setting JAVA_HOME before starting Elasticsearch. “

Elasticsearch 7.0.0 released | Elastic Blog

The documentation tells us more about the bundled JDK:

” Elasticsearch is built using Java, and includes a bundled version of OpenJDK from the JDK maintainers (GPLv2+CE) within each distribution. The bundled JVM is the recommended JVM and is located within the jdk directory of the Elasticsearch home directory.
“To use your own version of Java, set the JAVA_HOME environment variable. If you must use a version of Java that is different from the bundled JVM, we recommend using a supported LTS version of Java. Elasticsearch will refuse to start if a known-bad version of Java is used. The bundled JVM directory may be removed when using your own JVM.”

Set up Elasticsearch | Elasticsearch Reference [7.2] | Elastic

Therefore, after downloading a fresh version of Elasticsearch (7.2 is the latest at the time of writing this), we notice that there is a jdk folder as described above:

The jdk folder containing the bundled JDK.

On a machine with no JAVA_HOME set, Elasticsearch will, as from version 7, use this jdk folder automatically:

Although JAVA_HOME is not set, Elasticsearch starts up anyway.

This means that we can now skip the entire section of setting up Elasticsearch that revolves around having a version of Java already available and setting the JAVA_HOME environment variable.

On the other hand, if you do have JAVA_HOME set, Elasticsearch will use that, and will not use the bundled JDK at all. This in turn means that if you have JAVA_HOME set incorrectly (e.g. to a directory that no longer exists), Elasticsearch fails with a misleading error that seems to indicate that it’s also looking for the bundled JDK:

"could not find java in JAVA_HOME or bundled at C:\tools\elasticsearch-7.2.0\jdk"

Therefore, if you want to use our own JDK, then make sure JAVA_HOME is set correctly. If you want to use the bundled one, then make sure JAVA_HOME is not set.

Enabling Dark Mode in Kibana

Those Kibana users who prefer their software with a dark theme will be thrilled to know that Kibana actually does have a dark mode since version 7.0.0.

It can be enabled by following the steps illustrated below.

Go to Management from the left navigation.
Select Advanced Settings on the left.
Find the Dark mode setting somewhere further down in the page.
Switch on the Dark mode setting, then reload the page.
Like the sky on a stormy day, the page goes dark.
In fact, everything from Discover to Maps (and beyond) becomes dark.

Dark mode is a welcome feature for those who prefer darker tones on their screen as a matter of personal taste or to reduce eye strain.

Elastic Stack 7.0 Launch Event Summary

On Thursday 25th April 2019, just two days ago, the Elastic team held the Elastic Stack 7.0 Live (Virtual) Event, in which they explained and showcased several of the features in the latest version of Elasticsearch and its accompanying tools that were released on 10th April.

A recording is available at the link above, and I highly recommend watching it. However, I am writing this summary for the sake of those who might want to quickly check out the highlights without spending close to two hours watching the recording, or for those who want to quickly locate some of the relevant information (video isn’t a great medium to search for info).

Overview

“This version of the Elastic Stack looks very different from our early releases. It’s […] a much more mature product. We’ve had… 7 years now to learn and grow. But really we’re still focusing on the same 3 principles that have made Elastic popular from the beginning: speed, scale and relevance.”

— Clint Gormley, Stack Team Lead

The Elastic team has invested a lot of work into making Elasticsearch easy to scale, in such a way that it works the same on a laptop and in a data centre with hundreds of nodes with minimal configuration. However, the harsh realities of distributed systems (disk corruptions, split brains, shard inconsistencies etc) make this a very hard problem to solve, and the team has over the years added incremental changes to improve the product’s resiliency.

It is this work that has led to cross-cluster replication (released in 6.5), the removal of the minimum master nodes setting (released in 7.0), and will also enable following a stream of changes as they happen in an index.

“Version 7 is the safest, most flexible, easiest to use and scalable version of Elasticsearch that we’ve ever delivered.”

— Clint Gormley, Stack Team Lead

Fundamental changes have also been made in the way search itself works. Elasticsearch 7.0 uses an algorithm called Block Max WAND to greatly improve the speed of queries at the cost of not knowing exactly how many documents matched. This is usually a reasonable tradeoff because people usually want to get the top N results, rather than knowing the total hit count.

The raw speedup given by this new algorithm also has implications in terms of relevance of results and usability. Because search is so fast, it is no longer costly to search for stop words, and thus precision and recall can be improved by including them. Work is also ongoing on a search-as-you-type feature that would not be possible without this new level of performance.

Using BKD-trees instead of inverted indices have also resulted in significant speedups, especially in the realm of geo-shapes where accuracy has also improved considerably as a result.

Kibana got a new design, as its role has grown from being used to visualise Elasticsearch data to becoming an all-encompassing tool to manage the Elastic stack.

Also new on the ingest side is something called the Elastic Common Schema, which is a consistent way to map similar data from different data sources (e.g. Apache, IIS, NGINX) into a single structure.

Kibana 7 Design Considerations

A demo of Kibana 7, both in a browser and a mobile simulator.

Kibana 7 sports a new design as a result of a design-at-scale problem. The number of services offered by Kibana (see the tab drawer to the left) has increased considerably, and this called for a consistent and usable layout that could cater for applications as diverse as maps and logging.

Kibana’s dark mode, making the logging UI look like a terminal.

Some of the more superficial (but by no means trivial) work that went into Kibana was related to making it responsive (i.e. it responds nicely when you resize the browser window) and mobile-friendly (which in the words of Dave Snider, Director of Product design, is still “pretty beta”), as well as the dark mode that applies a darker theme throughout the product.

More importantly, however, Kibana 7 wants users to focus on the content (search results, graphs, visualisations etc) rather than the Kibana tooling itself, and that means moving things like the date picker and even Kibana’s own navigation out of the way.

The new design is based on a set of values:

  • Accessible to everyone (colour-blindness, screen reader support, tab around without using a mouse, etc)
  • Themable (easy to change colours)
  • Responsive (works in different screen sizes)
  • Playful (make it feel like fun – lively animations and such)
  • Well-documented (important for a distributed and open-source company)

This design was achieved by building the Elastic UI Framework, a React and CSS library of all UI controls used to build Kibana. It is open-source and fully documented with demos.

Making Search Faster (and Easier)

An example from the demo showing a stop word query from two fields returned in 27ms, but did not return an accurate hit count.

The Block Max WAND algorithm makes search significantly faster when we don’t need the total hit count. A demo showing a query involving stop words showed that the search took more than 10 times as long without this optimisation as it did with it.

The same search, run with track_total_hits set to true. This gives an accurate total hit count, but the query is significantly slower.

The Block Max WAND optimisation, enabled by default in Elasticsearch 7.0, can be disabled at any time using the track_total_hits setting if an exact hit count is required. It is also disabled automatically when using aggregations, to which the optimisation cannot be applied. Even with the optimisation enabled, total hits are tracked up to a maximum of 10,000. You can tell whether the hit count is accurate or not by seeing whether the hits.total.relation value is “eq” (which means it’s accurate) or “gte” (which means the actual hit count will be greater than or equal to 10,000).

This ground-breaking enhancement to the way search works is beneficial not only in speeding up queries, but also in enabling new features. In fact, a search-as-you-type feature is under development and is planned for the 7.1 release. Aside from that, feature fields and interval queries are also mentioned in the presentation.

Cluster Resiliency and Scale

The role of the Cluster Coordination Subsystem.

Elasticsearch 7 brings with it a new cluster coordination subsystem, which is responsible for the ongoing healthy operation of an Elasticsearch cluster. This has led to the removal of the minimum_master_nodes setting, which could prove very painful pre-7.0. Master elections are also a lot faster (going from at least 3 seconds in pre-7.0 to a few hundred milliseconds in 7.0), and logging is available when things go wrong.

The new cluster coordination system has been verified using formal methods, typically employed in mission-critical systems. Also, upgrading to this new system can be done without downtime.

An important resiliency enhancement in 7.0 is the real-memory circuit breaker. Elasticsearch uses several circuit breakers, designed to push back on requests when under load to avoid out-of-memory errors. The new real-memory circuit breaker allows Elasticsearch to know exactly how much memory will be allocated, making it less likely to break while at the same time using less overhead.

Cross-cluster replication (which shares an acronym with Creedence Clearwater Revival) is production-ready in 7.0, and addresses a number of very real use cases.

Elasticsearch 7.0 also introduces production-ready cross-cluster replication, allowing changes to indices to be synchronised with remote Elasticsearch clusters. The slide shown above describes some use cases where this is useful.

Geo Gorgeous (i.e. Maps)

The support for geographical applications by Elasticsearch and Kibana has received a considerable boost in version 7. At a basic level:

  • geo_points and geo_shapes now fully use BKD-trees
  • Ingest nodes can now use the GeoIP processor, and Logstash has a geoip filter plugin
  • Kibana gets a Coordinate Map, Region Map, as well as Vega and Maps capabilities
  • An Elastic Maps Service is now available
  • A new geo_shape type makes geo_shape fields a lot easier to work with
Using BKD-trees for Geo Shapes yields incredible improvements.

The use of BKD-trees for Geo Shapes significantly reduces the complexity of their representation, and therefore their storage. This results in considerable speed (indexing and querying), space and accuracy improvements, as shown in the slide above (and further in the video).

Elasticsearch 7.0 also introduces the geo_tile aggregation, which (unlike the geo hashes in use so far) conforms to the Web Mercator specification. Grid tiles are thus actually square, and preserve identical aspect ratio at all scales and latitudes.

The rest of the presentation on geo focuses on Kibana Maps, which is beta in 7.0. It is a great tool allowing compisition of maps from multiple data sources, as the demo shows. The rest of the screenshots below are stills from the demo, and each demonstrates a particular functionality.

The demo is based on data that simulates network requests. A layer is added to the map based on the geographical location of each record, first as points, then as grid rectangles, and finally as a heat map.
Another layer is added, bringing in countries from the Elastic Maps Service.
Joining the point and country data results in country polygons shaded by the number of requests that originated there.
It is possible to use a custom map service, as shown by this dark map coming from a third party source.
Data centres (the big green circles) are added to the map.
The location of individual requests (smaller green circles) are also added to the map, and gradually made smaller until they are barely visible.
Request paths — lines connecting individual requests to data centres — are added as well.
Since this is Kibana, the power of search is always available. The results are restricted to the last five minutes and to one particular data centre.

Summary (of the Summary)

Elastic Stack 7.0 is packed with new features and improvements. The launch event, still available on video and summarised in this article, barely scratches the surface. There is certainly a lot to be excited about.

Some items we’ve touched upon include:

  • Kibana has grown and got a redesign.
  • Block Max WAND significantly speeds up search (at the cost of total hit count), and paves the way for future features such as search-as-you-type.
  • A new cluster coordination subsystem, real-memory circuit breaker, and cross-cluster replication improve cluster resiliency and scale.
  • Significant improvements have been made in the geo space, and Kibana Maps is awesome.

Log Shipping with Filebeat and Elasticsearch

Introduction

Aside from being a powerful search engine, Elasticsearch has in recent years become very popular as a special-purpose logging storage and analysis solution. Logstash and beats were eventually introduced to help Elasticsearch cope better with the volume of logs being ingested.

In this article, we’ll see how to use Filebeat to ship existing logfiles into Elasticsearch, so that they can be viewed and analysed in Kibana.

Since it’s not possible to cover all scenarios exhaustively and keep this article concise and at a reasonable length, we’ll make a few assumptions here:

  1. We’ll use Filebeat on Windows.
  2. We’ll ship logs directly into Elasticsearch, i.e. no Logstash. This is good if the scale of logging is not so big as to require Logstash, or if it is just not an option (e.g. using Elasticsearch as a managed service in AWS).
  3. We’re running on-premises, and already have log files we want to ship. If we were running managed services within the cloud, then logging to file would often not be an option, and in that case we should use whatever logging mechanism is available from the cloud provider.

Motivation

Logging is ubiquitous. You’ll find it in virtually every application out there. As such, it’s a problem that has been solved to death. There are so many logging frameworks out there, it’s just crazy.

And despite this, it baffles me why so many companies today still opt to write their own logging libraries, either from scratch or as abstractions of other logging libraries. They could just use one of the myriad existing solutions out there, which are probably far more robust and performant than theirs will ever be.

In order to realise just how stupid reinventing the wheel is, let’s take an example scenario. You have your big software monolith that’s writing to one or more log files. You begin to break up the monolith into microservices, and realise that you now have log files everywhere: in multiple applications across different servers. So… you need to write a logging library that all your microservices can use to write the logs to a central data store (could be any kind of relational or NoSQL database).

 

That’s great! Your logs are now in one place and you can search through them effortlessly! And your code is even DRY because you wrote another common library (hey, you only need like 35 of them now to write a new microservice).

But wait, having applications write directly to a log store is wrong on so many levels. Here are a few:

  1. Logs buffered in memory can be permanently lost if the application terminates unexpectedly.
  2. The application must take the performance hit of communicating with the remote endpoint.
  3. Through the logging library, the application must depend on a client library for that logging store. This is a form of coupling that doesn’t work very well with microservices. Even worse, if the logging library isn’t designed properly, it may carry dependencies on mutiple logging stores.

These practical issues don’t even take into consideration the effort and complexity involved in creating a fully-featured logging library.

So what is the alternative? Simply keep writing to log files, and have a separate application (a log shipper) send those logs to a centralised store. Again, you don’t have to write the log shipper yourself. There are more than enough out there that you can just pick up and use.

 

This approach has a number of advantages:

  1. The log shipper is an offline process, and will not directly impact performance of applications.
  2. Files are about as fast as it gets for an application to write logs.
  3. If there is a problem sending logs to the store, the original log files are still there as a single source of truth.
  4. The log shipper can send logs to the store in bulk. There is no need to dangerously buffer them in memory. They are already there on disk.
  5. If the original logger (to file) is configured to flush on each write, then it’s virtually impossible that logs will be lost.
  6. There are no additional dependencies for the application. Just the original logging library.
  7. Developers can leverage their knowledge of existing libraries, and don’t have to learn to use a new one every time they start a new job.
  8. Developers can focus on solving real problems, rather than reinventing the wheel.

“But wait!” I can already hear the skeptics. “Existing logging libraries are not fast enough!” goes one of them. To this chap, I say:

  • Have you really tried all existing logging libraries? (Only Chuck Norris has done that, as far as I can tell. Twice.)
  • Is it possible that you’re simply not using a library correctly? (Maybe tweak some configuration settings?)
  • Even if you really could write something faster, it’s likely that the benefit will be negligible, and that it will only be faster under certain conditions. Surely you have more important performance consideratons than how many logs you can write per second.

“But wait!” goes another skeptic. “We might need to change the logging library later.” This is the same tired old excuse that is very common about data-access-layer code. “We might have to change our database!” Some folks still go on after some forty years.

This is a very common over-engineering scenario in which we create an abstraction of an abstraction. NLog and other logging libraries can already plug into a variety of output destinations, so it’s very unlikely that you’ll ever need to change them. Actually, it’s more likely that you’ll run into limitations by using abstractions such as Common.Logging where you end up with a common denominator and can’t make use of advanced features that a specific logging library might offer.

Changing a logging library should be mostly a matter of changing packages, and updating code via search and replace. So if you need to change it, just change it. That’s way cheaper than the complexity introduced by an extra layer of unnecessary abstraction for no other reason than “just in case”. Especially if you’re doing microservices (properly) – you should be able to change your logging library and redeploy in a matter of minutes.

Beats and Filebeat

beat is a lightweight agent that can siphon data from a source and send it to Logstash or Elasticsearch. There are several beats that can gather network data, Windows event logs, log files and more, but the one we’re concerned with here is the Filebeat.

After you download Filebeat and extract the zip file, you should find a configuration file called filebeat.yml. For a quick start, look for filebeat.prospectors, and under it:

  • Change the value of enabled from false to true.
  • Under paths, comment out the existing entry for /var/log/*.log, and instead put in a path for whatever log you’ll test against.

This part of filebeat.yml should now look something like this:

filebeat.prospectors:

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    #- /var/log/*.log
    #- c:\programdata\elasticsearch\logs\*
    - C:\ConsoleApp1\*.log

Also if your Elasticsearch server isn’t the default localhost:9200, be sure to change it further down in the file.

In that ConsoleApp1, I have a file called Debug.log which contains the following log entries:

2018-03-18 15:43:40.7914 - INFO: Tick
2018-03-18 15:43:42.8215 - INFO: Tock
2018-03-18 15:43:42.8683 - ERROR: Error doing TickTock!
EXCEPTION: System.DivideByZeroException: Attempted to divide by zero.
   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18

I’ll be using this simple (silly) example to show how to work with Filebeat.

Next, we can invoke filebeat.exe. When you do this, two folders get created. One is logs, where you can check Filebeat’s own logs and see if it has run into any problems. The other is data, and I believe this is where Filebeat keeps track of its position in each log file it’s tracking. If you delete this folder, it will go through the log files and ship them again from scratch.

Go into Kibana, and then into Management and Index Patterns. If all went well, Kibana will find the index that was created by Filebeat. You can create the index pattern filebeat-* to capture all Filebeat data:

For the time filter field, choose @timestamp, which is created and populated automatically by Filebeat.

In Kibana, you can now go back to Discover and see the log data (you may need to extend the time range):

As you can see, Filebeat successfully shipped the logs into Elasticsearch, but the logs haven’t been meaningfully parsed:

  • The message field contains everything, including timestamp, log level and actual message.
  • The exception stack trace was split into different entries per line.
  • The Time field showing in Kibana is actually the time when the log was shipped, not the timestamp of the log entry itself.

We’ll deal with these issues in the next sections.

Elasticsearch Pipeline

One way to properly parse the logs when they are sent to Elasticsearch is to create an ingest pipeline in Elasticsearch itself. There’s a good article by James Huang showing how to use this to ship logs from Filebeats to managed Elasticsearch in AWS.

By adapting the example in that article, we can create a pipeline for our sample log file. Run the following in Kibana’s Dev Tools:

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      }
    }
  ]
}

Now, getting that pattern right is a pain in the ass. The Grok Debugger is a great help, and there’s also a list of data types you can use.

In filebeat.yml, we now need to configure Filebeat to use this Elasticsearch pipeline:

output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  pipeline: logpipeline

We can now try indexing the logs again. First, let’s delete the Filebeat index:

DELETE filebeat-*

Next, delete the Filebeat’s data folder, and run filebeat.exe again.

In Discover, we now see that we get separate fields for timestamp, log level and message:

If you get warnings on the new fields (as above), just go into Management, then Index Patterns, and refresh the filebeat-* index pattern.

Now, you’ll see that for the error entry, we did not get the full exception stack trace. If we go into the Filebeat logs, we can see something like this:

2018-03-18T23:16:26.614Z	ERROR	pipeline/output.go:92	Failed to publish events: temporary bulk send failure
2018-03-18T23:16:26.616Z	INFO	elasticsearch/client.go:690	Connected to Elasticsearch version 6.1.2
2018-03-18T23:16:26.620Z	INFO	template/load.go:73	Template already exists and will not be overwritten.
2018-03-18T23:16:27.627Z	ERROR	pipeline/output.go:92	Failed to publish events: temporary bulk send failure
2018-03-18T23:16:27.629Z	INFO	elasticsearch/client.go:690	Connected to Elasticsearch version 6.1.2
2018-03-18T23:16:27.635Z	INFO	template/load.go:73	Template already exists and will not be overwritten.

Correspondingly, in Elasticsearch we can see several errors such as the following accumulating:

[2018-03-18T23:16:25,610][DEBUG][o.e.a.b.TransportBulkAction] [8vLF54_] failed to execute pipeline [logpipeline] for document [filebeat-6.2.2-2018.03.18/doc/null]
org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:156) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:107) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42) ~[elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.2.jar:6.1.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.2.jar:6.1.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	... 11 more
Caused by: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [   at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18]
	at org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:67) ~[?:?]
	at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100) ~[elasticsearch-6.1.2.jar:6.1.2]
	... 9 more

Elasticsearch is making a fuss because it can’t parse the lines from the exception. This is a problem because if Elasticsearch can’t parse the logs, Filebeat will keep trying to send them and never make progress. We’ll have to deal with that exception stack trace now.

Multiline log entries

In order to log the exception correctly, we have to enable multiline processing in Filebeat. In filebeat.yml, there are some multiline settings that are commented out. We need to enable them and change them a little, such that any line not starting with a date is appended to the previous line:

  ### Multiline options

  # Mutiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  multiline.pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{4}'

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  multiline.negate: true

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  multiline.match: after

Configuring the Filebeat to support multiline log entries is not enough though. We also need to update the pipeline in Elasticsearch to apply the grok filter on multiple lines ((?m)) and to separate the exception into a field of its own. I’ve had to split the two cases (with and without exception) into separate patterns in order to make it work.

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?",
            "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      }
    }
  ]
}

After deleting the index and the Filebeat data folder, and re-running Filebeat, we now get a perfect multiline exception stack trace in its own field!

Fixing the Timestamp

We now have one last issue to fix: the logs being ordered by when they were inserted into the index, rather than the log timestamp. This is actually a pretty serious problem from a usability perspective, because it means people troubleshooting production issues won’t be able to use Kibana’s time filter (e.g. last 15 minutes) to home in on the most relevant logs.

In order to fix this, we need to augment our pipeline with a date processor:

PUT /_ingest/pipeline/logpipeline
{
  "description" : "Pipeline for logs from filebeat",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?",
        "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"]
      },
      "date" : {
        "field" : "timestamp",
        "target_field" : "@timestamp",
        "formats" : ["yyyy-MM-dd HH:mm:ss.SSSS"]
      }
    }
  ]
}

The names of the fields in the date section are important. We’re basically telling it to take whatever is in the timestamp field (based on one of the earlier patterns) and apply it to @timestamp. As it happens, @timestamp is what is being used as the time-series field, which gives us exactly the result we want after reshipping the logs (be sure to extend the time window in Kibana accordingly to see the logs):

Summary

In this article, we’ve explored log shipping to augment regular file logging with purpose-built tools, rather than reinventing the wheel and writing yet another logging library. The latter approach would not only be a tremendous waste of time, but there are reliability, performance and maintainability implications to consider.

We have specifically looked at using Filebeat to ship logs directly into Elasticsearch, which is a good approach when Logstash is either not necessary or not possible to have. In order to get our log data nicely structured so that we can analyse it in Kibana, we’ve had to set up an ingest pipeline in Elasticsearch.

We progressively refined both our Filebeat configuration and this pipeline in order to split up our logs into separate fields, process multiline exception stack traces, and use the original timestamp in the logs as the time series field.

There is a lot more that Filebeats can do. For instance, a Filebeat may be configured with multiple prospectors, meaning it can read log files from different places and apply different options accordingly. One useful example of this is to add a custom field indicating the origin of the logs – this is useful when the log data itself does not include the application name, for instance.