Introduction
Aside from being a powerful search engine, Elasticsearch has in recent years become very popular as a special-purpose logging storage and analysis solution. Logstash and beats were eventually introduced to help Elasticsearch cope better with the volume of logs being ingested.
In this article, we’ll see how to use Filebeat to ship existing logfiles into Elasticsearch, so that they can be viewed and analysed in Kibana.
Since it’s not possible to cover all scenarios exhaustively and keep this article concise and at a reasonable length, we’ll make a few assumptions here:
- We’ll use Filebeat on Windows.
- We’ll ship logs directly into Elasticsearch, i.e. no Logstash. This is good if the scale of logging is not so big as to require Logstash, or if it is just not an option (e.g. using Elasticsearch as a managed service in AWS).
- We’re running on-premises, and already have log files we want to ship. If we were running managed services within the cloud, then logging to file would often not be an option, and in that case we should use whatever logging mechanism is available from the cloud provider.
Motivation
Logging is ubiquitous. You’ll find it in virtually every application out there. As such, it’s a problem that has been solved to death. There are so many logging frameworks out there, it’s just crazy.
And despite this, it baffles me why so many companies today still opt to write their own logging libraries, either from scratch or as abstractions of other logging libraries. They could just use one of the myriad existing solutions out there, which are probably far more robust and performant than theirs will ever be.
In order to realise just how stupid reinventing the wheel is, let’s take an example scenario. You have your big software monolith that’s writing to one or more log files. You begin to break up the monolith into microservices, and realise that you now have log files everywhere: in multiple applications across different servers. So… you need to write a logging library that all your microservices can use to write the logs to a central data store (could be any kind of relational or NoSQL database).
That’s great! Your logs are now in one place and you can search through them effortlessly! And your code is even DRY because you wrote another common library (hey, you only need like 35 of them now to write a new microservice).
But wait, having applications write directly to a log store is wrong on so many levels. Here are a few:
- Logs buffered in memory can be permanently lost if the application terminates unexpectedly.
- The application must take the performance hit of communicating with the remote endpoint.
- Through the logging library, the application must depend on a client library for that logging store. This is a form of coupling that doesn’t work very well with microservices. Even worse, if the logging library isn’t designed properly, it may carry dependencies on mutiple logging stores.
These practical issues don’t even take into consideration the effort and complexity involved in creating a fully-featured logging library.
So what is the alternative? Simply keep writing to log files, and have a separate application (a log shipper) send those logs to a centralised store. Again, you don’t have to write the log shipper yourself. There are more than enough out there that you can just pick up and use.
This approach has a number of advantages:
- The log shipper is an offline process, and will not directly impact performance of applications.
- Files are about as fast as it gets for an application to write logs.
- If there is a problem sending logs to the store, the original log files are still there as a single source of truth.
- The log shipper can send logs to the store in bulk. There is no need to dangerously buffer them in memory. They are already there on disk.
- If the original logger (to file) is configured to flush on each write, then it’s virtually impossible that logs will be lost.
- There are no additional dependencies for the application. Just the original logging library.
- Developers can leverage their knowledge of existing libraries, and don’t have to learn to use a new one every time they start a new job.
- Developers can focus on solving real problems, rather than reinventing the wheel.
“But wait!” I can already hear the skeptics. “Existing logging libraries are not fast enough!” goes one of them. To this chap, I say:
- Have you really tried all existing logging libraries? (Only Chuck Norris has done that, as far as I can tell. Twice.)
- Is it possible that you’re simply not using a library correctly? (Maybe tweak some configuration settings?)
- Even if you really could write something faster, it’s likely that the benefit will be negligible, and that it will only be faster under certain conditions. Surely you have more important performance consideratons than how many logs you can write per second.
“But wait!” goes another skeptic. “We might need to change the logging library later.” This is the same tired old excuse that is very common about data-access-layer code. “We might have to change our database!” Some folks still go on after some forty years.
This is a very common over-engineering scenario in which we create an abstraction of an abstraction. NLog and other logging libraries can already plug into a variety of output destinations, so it’s very unlikely that you’ll ever need to change them. Actually, it’s more likely that you’ll run into limitations by using abstractions such as Common.Logging where you end up with a common denominator and can’t make use of advanced features that a specific logging library might offer.
Changing a logging library should be mostly a matter of changing packages, and updating code via search and replace. So if you need to change it, just change it. That’s way cheaper than the complexity introduced by an extra layer of unnecessary abstraction for no other reason than “just in case”. Especially if you’re doing microservices (properly) – you should be able to change your logging library and redeploy in a matter of minutes.
Beats and Filebeat
A beat is a lightweight agent that can siphon data from a source and send it to Logstash or Elasticsearch. There are several beats that can gather network data, Windows event logs, log files and more, but the one we’re concerned with here is the Filebeat.
After you download Filebeat and extract the zip file, you should find a configuration file called filebeat.yml. For a quick start, look for filebeat.prospectors, and under it:
- Change the value of enabled from false to true.
- Under paths, comment out the existing entry for /var/log/*.log, and instead put in a path for whatever log you’ll test against.
This part of filebeat.yml should now look something like this:
filebeat.prospectors: # Each - is a prospector. Most options can be set at the prospector level, so # you can use different prospectors for various configurations. # Below are the prospector specific configurations. - type: log # Change to true to enable this prospector configuration. enabled: true # Paths that should be crawled and fetched. Glob based paths. paths: #- /var/log/*.log #- c:\programdata\elasticsearch\logs\* - C:\ConsoleApp1\*.log
Also if your Elasticsearch server isn’t the default localhost:9200, be sure to change it further down in the file.
In that ConsoleApp1, I have a file called Debug.log which contains the following log entries:
2018-03-18 15:43:40.7914 - INFO: Tick 2018-03-18 15:43:42.8215 - INFO: Tock 2018-03-18 15:43:42.8683 - ERROR: Error doing TickTock! EXCEPTION: System.DivideByZeroException: Attempted to divide by zero. at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18
I’ll be using this simple (silly) example to show how to work with Filebeat.
Next, we can invoke filebeat.exe. When you do this, two folders get created. One is logs, where you can check Filebeat’s own logs and see if it has run into any problems. The other is data, and I believe this is where Filebeat keeps track of its position in each log file it’s tracking. If you delete this folder, it will go through the log files and ship them again from scratch.
Go into Kibana, and then into Management and Index Patterns. If all went well, Kibana will find the index that was created by Filebeat. You can create the index pattern filebeat-* to capture all Filebeat data:
For the time filter field, choose @timestamp, which is created and populated automatically by Filebeat.
In Kibana, you can now go back to Discover and see the log data (you may need to extend the time range):
As you can see, Filebeat successfully shipped the logs into Elasticsearch, but the logs haven’t been meaningfully parsed:
- The message field contains everything, including timestamp, log level and actual message.
- The exception stack trace was split into different entries per line.
- The Time field showing in Kibana is actually the time when the log was shipped, not the timestamp of the log entry itself.
We’ll deal with these issues in the next sections.
Elasticsearch Pipeline
One way to properly parse the logs when they are sent to Elasticsearch is to create an ingest pipeline in Elasticsearch itself. There’s a good article by James Huang showing how to use this to ship logs from Filebeats to managed Elasticsearch in AWS.
By adapting the example in that article, we can create a pipeline for our sample log file. Run the following in Kibana’s Dev Tools:
PUT /_ingest/pipeline/logpipeline { "description" : "Pipeline for logs from filebeat", "processors": [ { "grok": { "field": "message", "patterns": ["%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"] } } ] }
Now, getting that pattern right is a pain in the ass. The Grok Debugger is a great help, and there’s also a list of data types you can use.
In filebeat.yml, we now need to configure Filebeat to use this Elasticsearch pipeline:
output.elasticsearch: # Array of hosts to connect to. hosts: ["localhost:9200"] pipeline: logpipeline
We can now try indexing the logs again. First, let’s delete the Filebeat index:
DELETE filebeat-*
Next, delete the Filebeat’s data folder, and run filebeat.exe again.
In Discover, we now see that we get separate fields for timestamp, log level and message:
If you get warnings on the new fields (as above), just go into Management, then Index Patterns, and refresh the filebeat-* index pattern.
Now, you’ll see that for the error entry, we did not get the full exception stack trace. If we go into the Filebeat logs, we can see something like this:
2018-03-18T23:16:26.614Z ERROR pipeline/output.go:92 Failed to publish events: temporary bulk send failure 2018-03-18T23:16:26.616Z INFO elasticsearch/client.go:690 Connected to Elasticsearch version 6.1.2 2018-03-18T23:16:26.620Z INFO template/load.go:73 Template already exists and will not be overwritten. 2018-03-18T23:16:27.627Z ERROR pipeline/output.go:92 Failed to publish events: temporary bulk send failure 2018-03-18T23:16:27.629Z INFO elasticsearch/client.go:690 Connected to Elasticsearch version 6.1.2 2018-03-18T23:16:27.635Z INFO template/load.go:73 Template already exists and will not be overwritten.
Correspondingly, in Elasticsearch we can see several errors such as the following accumulating:
[2018-03-18T23:16:25,610][DEBUG][o.e.a.b.TransportBulkAction] [8vLF54_] failed to execute pipeline [logpipeline] for document [filebeat-6.2.2-2018.03.18/doc/null] org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [ at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18] at org.elasticsearch.ingest.CompoundProcessor.newCompoundProcessorException(CompoundProcessor.java:156) ~[elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:107) ~[elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:58) ~[elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.ingest.PipelineExecutionService.innerExecute(PipelineExecutionService.java:169) ~[elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.ingest.PipelineExecutionService.access$000(PipelineExecutionService.java:42) ~[elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.ingest.PipelineExecutionService$2.doRun(PipelineExecutionService.java:94) [elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.2.jar:6.1.2] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.2.jar:6.1.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121] Caused by: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [ at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18] ... 11 more Caused by: java.lang.IllegalArgumentException: Provided Grok expressions do not match field value: [ at ConsoleApp1.Program.Main(String[] args) in C:\ConsoleApp1\Program.cs:line 18] at org.elasticsearch.ingest.common.GrokProcessor.execute(GrokProcessor.java:67) ~[?:?] at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:100) ~[elasticsearch-6.1.2.jar:6.1.2] ... 9 more
Elasticsearch is making a fuss because it can’t parse the lines from the exception. This is a problem because if Elasticsearch can’t parse the logs, Filebeat will keep trying to send them and never make progress. We’ll have to deal with that exception stack trace now.
Multiline log entries
In order to log the exception correctly, we have to enable multiline processing in Filebeat. In filebeat.yml, there are some multiline settings that are commented out. We need to enable them and change them a little, such that any line not starting with a date is appended to the previous line:
### Multiline options # Mutiline can be used for log messages spanning multiple lines. This is common # for Java Stack Traces or C-Line Continuation # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [ multiline.pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2}\:\d{2}\.\d{4}' # Defines if the pattern set under pattern should be negated or not. Default is false. multiline.negate: true # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern # that was (not) matched before or after or as long as a pattern is not matched based on negate. # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash multiline.match: after
Configuring the Filebeat to support multiline log entries is not enough though. We also need to update the pipeline in Elasticsearch to apply the grok filter on multiple lines ((?m)
) and to separate the exception into a field of its own. I’ve had to split the two cases (with and without exception) into separate patterns in order to make it work.
PUT /_ingest/pipeline/logpipeline { "description" : "Pipeline for logs from filebeat", "processors": [ { "grok": { "field": "message", "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?", "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"] } } ] }
After deleting the index and the Filebeat data folder, and re-running Filebeat, we now get a perfect multiline exception stack trace in its own field!
Fixing the Timestamp
We now have one last issue to fix: the logs being ordered by when they were inserted into the index, rather than the log timestamp. This is actually a pretty serious problem from a usability perspective, because it means people troubleshooting production issues won’t be able to use Kibana’s time filter (e.g. last 15 minutes) to home in on the most relevant logs.
In order to fix this, we need to augment our pipeline with a date processor:
PUT /_ingest/pipeline/logpipeline { "description" : "Pipeline for logs from filebeat", "processors": [ { "grok": { "field": "message", "patterns": ["(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: (?<message>.*?)\n(%{GREEDYDATA:exception})?", "(?m)%{TIMESTAMP_ISO8601:timestamp} - %{WORD:logLevel}: %{GREEDYDATA:message}"] }, "date" : { "field" : "timestamp", "target_field" : "@timestamp", "formats" : ["yyyy-MM-dd HH:mm:ss.SSSS"] } } ] }
The names of the fields in the date section are important. We’re basically telling it to take whatever is in the timestamp field (based on one of the earlier patterns) and apply it to @timestamp. As it happens, @timestamp is what is being used as the time-series field, which gives us exactly the result we want after reshipping the logs (be sure to extend the time window in Kibana accordingly to see the logs):
Summary
In this article, we’ve explored log shipping to augment regular file logging with purpose-built tools, rather than reinventing the wheel and writing yet another logging library. The latter approach would not only be a tremendous waste of time, but there are reliability, performance and maintainability implications to consider.
We have specifically looked at using Filebeat to ship logs directly into Elasticsearch, which is a good approach when Logstash is either not necessary or not possible to have. In order to get our log data nicely structured so that we can analyse it in Kibana, we’ve had to set up an ingest pipeline in Elasticsearch.
We progressively refined both our Filebeat configuration and this pipeline in order to split up our logs into separate fields, process multiline exception stack traces, and use the original timestamp in the logs as the time series field.
There is a lot more that Filebeats can do. For instance, a Filebeat may be configured with multiple prospectors, meaning it can read log files from different places and apply different options accordingly. One useful example of this is to add a custom field indicating the origin of the logs – this is useful when the log data itself does not include the application name, for instance.