Elastic Stack: an overview and ELK installation on Ubuntu 20.04

By | 02/22/2022
 

The last time I’ve worked with the ELK stack about 7 years ago, see the ELK: установка Elasticsearch+Logstash+Kibana на CentOS.

Currently, we are using Logz.io, but its costs going higher and higher, so we started looking at the self-hosted ELK solution to be running on our AWS Elastic Kubernetes Service clusters.

So, the task, for now, is to spin up the Elastic Stack, check how it can be installed on Ubuntu 20.04, configure logs collection using Filebeat, their transformation with Logstash, save to Elasticsearch database, display with Kibana, and see how all those components are working together under the hood.

The main goal is to check their configuration and to see how they all are working together. This will be “like a production setup”, so we will not check some Kinaa settings like users authentification, but instead we’ll take a look on grok, Elasticsearch indexes, and so on.

Still, as usual, I’ll add some links at the end of the post.

And remember: “10 hours of debugging and trying to make and see how it’s working will save you 10 minutes of reading documentation”.

Elastic Stack: components overview

Elastic Stack, previously known as ELK (Elasticsearch + Logstash + Kibana) is one of the most well-known and widely used system for logs collection and aggregation. Also, it can be used to display metrics from services – clouds, servers, etc.

Elastic Stack consists of three main components::

  • Elasticsearch: a database with quick search features using Elasticsearch Index
  • Logstash: a tool to collect data from various sources, their transformation, and passing data to the Elasticsearch
  • Kibana: a web interface to display data from Elasticsearch

Also, for ELK is a set of additional tools called Beats used to collect data. Among them, worth to mention for example Filebeat to collect logs, and Metricbeat, which is used to collect information about CPU, memory, disks, etc. See also Logz.io: collection logs from Kubernetes – fluentd vs filebeat.

So, the workflow of the stack is the following:

  1. a server generates data, for example, logs
  2. the data is then collected by a local Beat-application, for logs, this will be Filebeat (although this is not mandatory, and logs can be collected by Logstash itself), and then sends the data to Logstash or directly to an Elastisearch database
  3. Logstash collects data from various sources (from Beats, or by collecting data directly), makes necessary transformations like adding/removing fields, and then passes the data to an Elastisearch database
  4. Elasticsearch is used to store the data and for quick search
  5. Kibana is used to display data from the Elastisearch database with a web interface

Create an AWS ЕС2

So, let’s go ahead with the installation process.

Will use Ubuntu 20.04, running on AWS EC2 instance.

Will use a “clear” system – without any Docker or Kubernetes integrations, everything will be done directly on the host.

The scheme used will be usual for the stack: Elasticsearch to store data, Filebeat to collect data from logs, Logstash for processing and pushing data to an Elastic index, and Kibana for visualization.

Go to the AWS Console > EC2 > Instances, createа a new one, choose Ubuntu OS:

Let’s choose the c5.2xlarge instance type – 4 vCPU, 9 GB RAM, as Elasticsaerch is working on Java that loves memory and CPU, and Logstash is written in JRuby:

Network settings can be left with the default settings, as again this is kind of Proof of Concept, so no need to dive deep in there:

Later, we will add an Elastic IP.

Increase the disk size for the instance up to 50 GB:

In the SecurityGroup open SSH and port 5601 (Kibana) from your IP:

In a more production-like setup, we will Вhave to have some NGINX or an Ingress resource before our Kibana with SSL. For now, run it “as is”.

Create a new RSA key pair (hint: it’s a good idea to include a key’s AWS Region to its name), save it:

Go to the Elastic IP addresses, obtain a new EIP:

Attach it to the EC2 instance:

On your workstation, change the key’s permissions to make it readable for your user only:

[simterm]

$ chmod 600 ~/Temp/elk-test-eu-west-2.pem

[/simterm]

Check connection:

[simterm]

$ ssh -i ~/Temp/elk-test-eu-west-2.pem [email protected]
...
ubuntu@ip-172-31-43-4:~$

[/simterm]

Upgrade the system:

[simterm]

ubuntu@ip-172-31-43-4:~$ sudo -s
root@ip-172-31-43-4:/home/ubuntu# apt update && apt -y upgrade

[/simterm]

Reboot it to load a new kernel after upgrade:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# reboot

[/simterm]

And go to the ELK components installation.

Elastic Stack/ELK installation on Ubuntu 20.04

Add an Elasticsearch repository:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
OK
root@ip-172-31-43-4:/home/ubuntu# apt -y install apt-transport-https
root@ip-172-31-43-4:/home/ubuntu# sh -c 'echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" > /etc/apt/sources.list.d/elastic-7.x.list'

[/simterm]

Elasticsearch installation

Install the elasticsearch package:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# apt update && apt -y install elasticsearch

[/simterm]

Elastic’s configuration file – /etc/elasticsearch/elasticsearch.yml.

Add to it a new parameter – discovery.type: single-node, as our Elasticsearch will be working as a single node, not as a cluster.

In case of necessity to update JVM options – use the /etc/elasticsearch/jvm.options.

Users and authentification described in the Set up minimal security for Elasticsearch, but for now we will skip the step – for testing, it’s enough that we’ve set limitations in the AWS SecurityGroup of the EC2.

Start the service, add it to the autostart:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# systemctl start elasticsearch
root@ip-172-31-43-4:/home/ubuntu# systemctl enable elasticsearch

[/simterm]

Check access to the Elasticseacrh API:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -X GET "localhost:9200"
{
  "name" : "ip-172-31-43-4",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "8kVCdVRySfKutRjPkkVr5w",
  "version" : {
    "number" : "7.16.3",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "4e6e4eab2297e949ec994e688dad46290d018022",
    "build_date" : "2022-01-06T23:43:02.825887787Z",
    "build_snapshot" : false,
    "lucene_version" : "8.10.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

[/simterm]

Logs are available in the /var/log/elasticsearch, and data is stored in the /var/lib/elasticsearch directory.

Elasticsearch Index

Let’s take a short overview of the indices in the Elastiseacrh, and how to access them via API.

In fact, you can think about them as databases in RDBMS systems like MySQL. The database stores documents and these documents are JSON-object of a specific type.

Indices are divided into shards – segments of the data, that are stored on one or more Elastisearch Nodes, but sharding and clustering are out of the scope of this post.

View an Index

To see all indexes, use the GET _cat/indices?v request:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/_cat/indices?v
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases 2E8sIYX0RaiqyZWzPHYHfQ   1   0         42            0     40.4mb         40.4mb

[/simterm]

For now, we can see only Elastic’s own index called .geoip_databases, which contains a list of IP blocks and related regions. Later, it can be used to add a visitor’s information to an NGINX access log’s data.

Create an Index

Add a new empty index:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -X PUT localhost:9200/example_index?pretty
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "example_index"
}

[/simterm]

Check it:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/_cat/indices?v
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases 2E8sIYX0RaiqyZWzPHYHfQ   1   0         42            0     40.4mb         40.4mb
yellow open   example_index    akWscE7MQKy_fceS9ZMGGA   1   1          0            0       226b           226b

[/simterm]

example_index – here is our new index.

And check the index itself:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/example_index?pretty
{
  "example_index" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "example_index",
        "creation_date" : "1642848658111",
        "number_of_replicas" : "1",
        "uuid" : "akWscE7MQKy_fceS9ZMGGA",
        "version" : {
          "created" : "7160399"
        }
      }
    }
  }
}

[/simterm]

Create a document in an index

Let’s add a simple document to the index, created above:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -H 'Content-Type: application/json' -X POST localhost:9200/example_index/document1?pretty -d '{ "name": "Just an example doc" }'
{
  "_index" : "example_index",
  "_type" : "document1",
  "_id" : "rhF0gX4Bbs_W8ADHlfFY",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

[/simterm]

And check all content of the index using the _search operation:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/example_index/_search?pretty
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "example_index",
        "_type" : "document1",
        "_id" : "qxFzgX4Bbs_W8ADHTfGi",
        "_score" : 1.0,
        "_source" : {
          "name" : "Just an example doc"
        }
      },
      {
        "_index" : "example_index",
        "_type" : "document1",
        "_id" : "rhF0gX4Bbs_W8ADHlfFY",
        "_score" : 1.0,
        "_source" : {
          "name" : "Just an example doc"
        }
      }
    ]
  }
}

[/simterm]

Using the document’s ID – get its content:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -X GET 'localhost:9200/example_index/document1/qxFzgX4Bbs_W8ADHTfGi?pretty'
{
  "_index" : "example_index",
  "_type" : "document1",
  "_id" : "qxFzgX4Bbs_W8ADHTfGi",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Just an example doc"
  }
}

[/simterm]

Searching an index

Also, we can do a quick search over the index.

Let’s look for by the name field and part of the content of the document, the “doc” word:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -H 'Content-Type: application/json' -X GET 'localhost:9200/example_index/_search?pretty' -d '{ "query": { "match": { "name": "doc" } } }'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "example_index",
        "_type" : "document1",
        "_id" : "qxFzgX4Bbs_W8ADHTfGi",
        "_score" : 0.18232156,
        "_source" : {
          "name" : "Just an example doc"
        }
      },
      {
        "_index" : "example_index",
        "_type" : "document1",
        "_id" : "rhF0gX4Bbs_W8ADHlfFY",
        "_score" : 0.18232156,
        "_source" : {
          "name" : "Just an example doc"
        }
      }
    ]
  }
}

[/simterm]

Delete an index

Use the DELETE and specify an index name to delete:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -X DELETE localhost:9200/example_index
{"acknowledged":true}

[/simterm]

Okay, now, we’ve seen what are indices, and how to work with them.

let’s go ahead and install Logstash.

Logstash installation

Install Logstash, it’s already present in the Elastic repository, that we’ve added before:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# apt -y install logstash

[/simterm]

Run the service:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# systemctl start logstash
root@ip-172-31-43-4:/home/ubuntu# systemctl enable logstash
Created symlink /etc/systemd/system/multi-user.target.wants/logstash.service → /etc/systemd/system/logstash.service.

[/simterm]

Main configuration file – /etc/logstash/logstash.yml, and for our configuration files, we will use the /etc/logstash/conf.d/ directory.

Its output (stdout) Logstash will write to the /var/logs/syslog.

Working with Logstash pipelines

See the How Logstash Works.

Pipelines in Logstash describes a chain: Input > Filter > Output.

In the Input, we can use, for example, such inputs as file, stdin, or beats.

Logstash Input and Output

To see how Logstash is working in general, let’s create the simplest pipeline that will accept data via its stdin, and prints it to the terminal via stdout.

The easiest way to test Logstash is to run its bin-file directly and pass configuration options via the -e:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
...
The stdin plugin is now waiting for input:
Hello, World!
{
       "message" => "Hello, World!",
      "@version" => "1",
    "@timestamp" => 2022-01-22T11:30:33.971Z,
          "host" => "ip-172-31-43-4"
}

[/simterm]

Logstash Filter: grok

And a very basic grok example.

Create a file called logstash-test.conf:

input { stdin { } }

filter {
    grok {
      match => { "message" => "%{GREEDYDATA}" }
    }
}

output {
  stdout { }
}

Here, in the filter we are using grok, that will search for a match in a message text.

For such a search, grok uses regular expression patterns. In the example above, we are using the GREEDYDATA  filter, that corresponds to the .* regex, i.e. any symbols.

Run Logstash again, but this time instead of the -e use -f and pass the file’s name:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# /usr/share/logstash/bin/logstash -f logstash-test.conf
...
The stdin plugin is now waiting for input:
Hello, Grok!
{
       "message" => "Hello, Grok!",
    "@timestamp" => 2022-01-22T11:33:49.797Z,
      "@version" => "1",
          "host" => "ip-172-31-43-4"
}

[/simterm]

Okay.

Let’s try to do some data transformation, for example, let’s add a new tag called “Example”, and two new fields: one will contain just a text “Example value“, and in the second, we will add a time when the message was received:

input { stdin { } }
filter {
    grok {
      match => { "message" => "%{GREEDYDATA:my_message}" }
      add_tag => ["Example"]
      add_field => [ "example_field", "Example value" ]
      add_field => [ "received_at", "%{@timestamp}" ]
    }
}
output {
  stdout { }
}

Run it:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# /usr/share/logstash/bin/logstash -f logstash-test.conf 
...
Hello again, Grok!
{
          "message" => "Hello again, Grok!",
             "host" => "ip-172-31-43-4",
             "tags" => [
        [0] "Example"
    ],
      "received_at" => "2022-01-22T11:36:46.893Z",
       "my_message" => "Hello again, Grok!",
       "@timestamp" => 2022-01-22T11:36:46.893Z,
    "example_field" => "Example value",
         "@version" => "1"
}

[/simterm]

Logstash Input: file

Okay, now let’s try something different, for example, let’s read data from the /var/log/syslog log-file.

At first, check the file’s content:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# tail -1 /var/log/syslog
Jan 22 11:41:49 ip-172-31-43-4 logstash[8099]: [2022-01-22T11:41:49,476][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9601, :ssl_enabled=>false}

[/simterm]

What do we have here?

  1. data and time – Jan 22 11:41:49
  2. a host – ip-172-31-43-4
  3. a program’s name – logstash
  4. a process PID – 8099
  5. and the message itself

In our filter, let’s use grok again, and in its match specify patterns and fields: instead of the GREEDYDATA that will save all the data in the “message” field, let’s add the SYSLOGTIMESTAMP, that will be triggered on the value Jan 21 14:06:23, and this value will be saved to the syslog_timestamp field, than SYSLOGHOST, DATA, POSINT, and the rest of the data we will get with the already known GREEDYDATA, and will save it to the syslog_message field.

Also, let’s add two additional fields – received_at and received_from, and will use the data, parsed in the macth, and then let’s drop the original message field, as we already save necessary data in the syslog_message:

input { 
  file {
    path => "/var/log/syslog"
  }
}

filter {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
      remove_field => "message"
    }
}

output {
  stdout { }
}

Run it:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# /usr/share/logstash/bin/logstash -f logstash-test.conf
...
{
                "host" => "ip-172-31-43-4",
                "path" => "/var/log/syslog",
         "received_at" => "2022-01-22T11:48:27.582Z",
      "syslog_message" => "#011at usr.share.logstash.lib.bootstrap.environment.<main>(/usr/share/logstash/lib/bootstrap/environment.rb:94) ~[?:?]",
    "syslog_timestamp" => "Jan 22 11:48:27",
      "syslog_program" => "logstash",
     "syslog_hostname" => "ip-172-31-43-4",
          "@timestamp" => 2022-01-22T11:48:27.582Z,
          "syslog_pid" => "9655",
            "@version" => "1",
       "received_from" => "ip-172-31-43-4"
}
...

[/simterm]

Well, nice!

Logstash output: elasticsearch

In the examples above, we’ve printed everything on the terminal.

Now, let’s try to save the data to an Elastisearch index:

input {
  file {
    path => "/var/log/syslog"
  }
}

filter {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
      remove_field => "message"
    }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
  stdout { }
}

Run it, and check Elastic’s indices – Logstash had to create a new one here:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/_cat/indices?v
health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases           2E8sIYX0RaiqyZWzPHYHfQ   1   0         42            0     40.4mb         40.4mb
yellow open   logstash-2022.01.22-000001 ekf_ntRxRiitIRcmYI2TOg   1   1          0            0       226b           226b
yellow open   example_index              akWscE7MQKy_fceS9ZMGGA   1   1          2            1      8.1kb          8.1kb

[/simterm]

logstash-2022.01.22-000001 – aha, here we are!

Let’s do some search over it, for example, about the logstash process, as it saves its output to the in the /var/log/syslog file:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl -H 'Content-Type: application/json' localhost:9200/logstash-2022.01.22-000001/_search?pretty -d '{ "query": { "match": { "syslog_program": "logstash" } } }'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 36,
      "relation" : "eq"
    },
    "max_score" : 0.33451337,
    "hits" : [
      {
        "_index" : "logstash-2022.01.22-000001",
        "_type" : "_doc",
        "_id" : "9BGogX4Bbs_W8ADHCvJl",
        "_score" : 0.33451337,
        "_source" : {
          "syslog_program" : "logstash",
          "received_from" : "ip-172-31-43-4",
          "syslog_timestamp" : "Jan 22 11:57:18",
          "syslog_hostname" : "ip-172-31-43-4",
          "syslog_message" : "[2022-01-22T11:57:18,474][INFO ][logstash.runner          ] Starting Logstash {\"logstash.version\"=>\"7.16.3\", \"jruby.version\"=>\"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 11.0.13+8 on 11.0.13+8 +indy +jit [linux-x86_64]\"}",
          "host" : "ip-172-31-43-4",
          "@timestamp" : "2022-01-22T11:59:40.444Z",
          "path" : "/var/log/syslog",
          "@version" : "1",
          "syslog_pid" : "11873",
          "received_at" : "2022-01-22T11:59:40.444Z"
        }
      },
...

[/simterm]

Yay! It works!

Let’s go ahead.

Filebeat installation

Install the package:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# apt -y install filebeat

[/simterm]

Configratuion file – /etc/filebeat/filebeat.yml.

By default, Filebeat will pass the data directly to an Elasticsearch instance:

...
# ================================== Outputs ===================================
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]
  # Protocol - either `http` (default) or `https`.
  #protocol: "https"
  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"
...

Update the file: add the /var/log/syslog file parsing, and instead of the Elastic, let’s put the data to the Logstash.

Configure the filestream input, do now forget to enable it with the enabled: true:

...
filebeat.inputs:
...
- type: filestream
  ...
  enabled: true
  ...
  paths:
    - /var/log/syslog
...

Comment the output.elasticsearch block, and uncomment the output.logstash:

...
# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #  hosts: ["localhost:9200"]
  ...
# ------------------------------ Logstash Output -------------------------------
output.logstash:
  ...
  hosts: ["localhost:5044"]
...

For the Logstash create a new config-file /etc/logstash/conf.d/beats.conf:

input {
  beats {
    port => 5044
  }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

In the elasticsearch specify a host and an index name to be used to save the data.

Run Logstash:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# systemctl start logstash

[/simterm]

Check the /var/log/syslog:

[simterm]

Jan 22 12:10:34 ip-172-31-43-4 logstash[12406]: [2022-01-22T12:10:34,054][INFO ][org.logstash.beats.Server][main][e3ccc6e9edc43cf62f935b6b4b9cf44b76d887bb01e30240cbc15ab5103fe4b6] Starting server on port: 5044

[/simterm]

Run Filebeat:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# systemctl start filebeat

[/simterm]

Check Elastic’s indices:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/_cat/indices?v
health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases           2E8sIYX0RaiqyZWzPHYHfQ   1   0         42            0     40.4mb         40.4mb
yellow open   filebeat-7.16.3-2022.01.22 fTUTzKmKTXisHUlfNbobPw   1   1       7084            0     14.3mb         14.3mb
yellow open   logstash-2022.01.22-000001 ekf_ntRxRiitIRcmYI2TOg   1   1         50            0     62.8kb         62.8kb
yellow open   example_index              akWscE7MQKy_fceS9ZMGGA   1   1          2            1      8.1kb          8.1kb

[/simterm]

filebeat-7.16.3-2022.01.22 – here it is.

Kibana installation

Install the package:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# apt -y install kibana

[/simterm]

Edit its config-file /etc/kibana/kibana.yml, set the server.host==0.0.0.0 to make it accessible over the Internet.

Run the service:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# systemctl start kibana
root@ip-172-31-43-4:/home/ubuntu# systemctl enable kibana

[/simterm]

Check with a browser:

Its status – /status:

Click on the Explore on my own, go to the Management > Stack management:

Go to the Index patterns, create a new patter for Kibana using the filebeat-* mask, and on the right side will see, that Kibana already found corresponding Elasticsearch indices:

And we can see all the fields, already indexed by Kibana:

Go to the Observability – Logs:

And can see our /var/log/syslog:

Logstash, Filebeat, and NGINX: configuration example

Now, let’s do something from the real world:

  1. install NGINX
  2. configure Filebeat to collect NGINX’s logs
  3. configure Logstash to accept them and save data to the Elastic
  4. and will check the Kibana

Install NGINX:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# apt -y install nginx

[/simterm]

Check its log-files:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# ll /var/log/nginx/
access.log  error.log

[/simterm]

Check if the web-server is working:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

[/simterm]

И access.log:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# tail -1 /var/log/nginx/access.log 
127.0.0.1 - - [26/Jan/2022:11:33:21 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.68.0"

[/simterm]

Okay.

Filebeat Inputs configguration

Documentation – Configure inputs, and Configure general settings.

Edit the filebeat.inputs block, to the /var/log/syslog input add two new – for NGINX’s access and error logs:

...
# ============================== Filebeat inputs ===============================

filebeat.inputs:

- type: filestream
  enabled: true
  paths:
    - /var/log/syslog
  fields:
    type: syslog
  fields_under_root: true
  scan_frequency: 5s

- type: log
  enabled: true
  paths:
      - /var/log/nginx/access.log
  fields:
    type: nginx_access
  fields_under_root: true
  scan_frequency: 5s

- type: log
  enabled: true
  paths:
      - /var/log/nginx/error.log
  fields:
    type: nginx_error
  fields_under_root: true
  scan_frequency: 5s
...

Here, we are using the log type of the input, and will add a new filed – type: nginx_access/nginx_error.

Logstash configuration

Delete the config-file for Logstash, that we’ve created before, /etc/logstash/conf.d/beats.conf, and write it over:

input {
  beats {
    port => 5044
  }
}

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
      remove_field => "message"
    }
  }
}

filter {
 if [type] == "nginx_access" {
    grok {
        match => { "message" => "%{IPORHOST:remote_ip} - %{DATA:user} \[%{HTTPDATE:access_time}\] \"%{WORD:http_method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
    }
  }
  date {
        match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
  }
  geoip {
         source => "remote_ip"
         target => "geoip"
         add_tag => [ "nginx-geoip" ]
  } 
} 
  
output {

  if [type] == "syslog" {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logstash-%{+YYYY.MM.dd}"
    }
  }
  
  if [type] == "nginx_access" {
    elasticsearch { 
      hosts => ["localhost:9200"]
      index => "nginx-%{+YYYY.MM.dd}"
    }
  }
 
  stdout { }
}

Here we’ve set:

  1. input on the 5044 port for the filebeat
  2. two filters:
    1. the first: will check the type field, and if its value == syslog, then Logstash will parse its data
    2. the second: will check the type field, and if its value == nginx_access, then Logstash will parse its data as NGINX access-log
  3. outout uses two if conditions and depending on their results will pass the data to index logstash-%{+YYYY.MM.dd} (for syslog) or nginx-%{+YYYY.MM.dd} (for NGINX)

Restart Logstash and Filebeat:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# systemctl restart logstash
root@ip-172-31-43-4:/home/ubuntu# systemctl restart filebeat

[/simterm]

Run  curl in a loop to generate some data in NGINX access-log:

[simterm]

ubuntu@ip-172-31-43-4:~$ watch -n 1 curl -I localhost

[/simterm]

Check Elasticsearch indices:

[simterm]

root@ip-172-31-43-4:/home/ubuntu# curl localhost:9200/_cat/indices?v
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
...
yellow open   logstash-2022.01.28             bYLp_kI3TwW3sPfh7XpcuA   1   1     213732            0      159mb          159mb
...
yellow open   nginx-2022.01.28                0CwH4hBhT2C1sMcPzCQ9Pg   1   1          1            0     32.4kb         32.4kb

[/simterm]

And here is our new NGINX index.

Go to the Kibana, and add another Index pattern logstash-*:

Go to the Analitycs > Discover, choose an index, and see your data:

In the same way – for the NGINX logs:

Done.

Useful links

Elastic Stack

Elasticsearch

Logstash

Filebeat