How to setup Fluentd to retrieve logs, send them to GCP Pub/Sub to finally push them to ElasticSearch

  • November 8, 2020

FluentD

First of all, let’s talk about FluentD, which is an open source data collector for unified logging layers. It allows you to unify data collection and consumption for a better use and understanding of data.

In general, it has a large 500+ plugin system that allows the community to extend its functionality. It is highly used and reliable, requiring minimum resources to work. Besides, it works with a Json unified logging layer.

PubSub

Pub/Sub is an asynchronous messaging service that decouples services that produce events from services that process events.
It can be used as messaging-oriented middleware or event ingestion and delivery for streaming analytics pipelines.Pubsub works with topics and subscribers; each message published in the topic will be sent to all the subscribers.

Managing logs in cloud environments

By making use of Fluentd and Pub/Sub in GCP, logs can be collected and sent to different stacks such as ELK. Currently, this activity has become a critical part of the Infrastructure administration.

Prerequisites

1: Google Project

In the Cloud Console, on the project selector page, select or create a Cloud project.

 

2: Pub/Sub

Part of this deployment is a Pub/Sub configuration, In the main bar type ‘PubSub’

 

3: Create a topic

Select the option ‘Create a topic’ in order to create a topic where you can publish your messages; in this case, logs from Fluentd.

 

 

4: Create a subscription under this topic

A subscription is going to be used for external systems to receive the messages. This could be ELK, Datadog, or any other monitoring tool.

 

4.1 Type your subscription name and select the topic that was created in the prior step.

 

5: Service account to publish to Pub/Sub

Two service account keys are going to be needed to publish and subscribe in Pub/Sub. 

5.1. In the main bar, select ‘IAM & Admin’ and select ‘Service Accounts’

5.2 Next step, click on ‘Create Service Account’

5.3 Fill in the ‘Service Account’ name and description, click ‘CREATE’:

5.4 Select the role ‘Pub/Sub Publisher’:

5.5 Select the created role, and click on ‘Create Key’

 

5.6 A JSON file key will be automatically downloaded, rename it as publisher.json. We will use this file in Fluentd forwarder’s configuration


6: Service Account for Subscribers 

Repeat the steps to create one more service account for Pub/Sub subscription and save the key file as subscriber.json


7: Fluentd Setup

For this example, we assume that Ngingx is running in a Google Cloud Virtual Machine with Ubuntu Linux.

 

Run the following commands:

# install td-agent 4
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent4.sh | sh

 

# install the gcloud pub sub plugin to push the logs to Pub/Sub
sudo /usr/sbin/td-agent-gem install fluent-plugin-gcloud-pubsub-custom

 

# Prepare the development libraries

sudo apt-get install libgdbm-dev libncurses5-dev automake libtool bison libffi-dev
gpg –keyserver hkp://keys.gnupg.net –recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB
curl -sSL https://get.rvm.io | bash -s stable
source ~/.rvm/scripts/rvm
rvm install 2.7.1
rvm use 2.7.1 –default
ruby –v

 

# Replace the existing fluentd config file with new file
cd /etc/td-agent/
mv td-agent.conf td-agent.conf.old
touch td-agent.conf

 

8: Fluentd Example Configuration

The following is the configuration file of Fluentd, which retrieve the nginx logs and match allows data to be uploaded to pubsub

 

<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/td-agent/nginx-access.pos
  tag example.publish
  format nginx
</source>

<match example.publish>
  @type gcloud_pubsub
  project [project-id]
  key /home/ubuntu/publisher.json
  topic [topic-name]
  autocreate_topic false
  max_messages 1000
  max_total_size 10000000
  flush_interval 1s
  try_flush_interval 0.1
  format json
</match>

 

Fill the following information:

[project-id] Project ID created in step 1
[topic-name] Topic created in step 3
And remember to use the subscriber.json key created in step 5

 

#Start the td agent using the command below
service td-agent start

 

#Verify the status of the service

systemctl status td-agent.service

 

Receiving logs in Pub/Sub

Whenever a request is made to the webserver, logs in /var/log/nginx/access.log are going to be pushed to Pub/Sub as displayed in the following image.

 

 

9: FluentD Aggregator

The FluentD aggregator will collect the logs from Pub/Sub and push them to ElasticSearch.
In addition, logs could be pushed to Google Cloud Storage although this part of the deployment will not be covered in this example.

 

 

# install td-agent 4
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent4.sh | sh

 

# install the Google Cloud pub sub plugin to pull the logs from Pub/Sub
sudo /usr/sbin/td-agent-gem install fluent-plugin-gcloud-pubsub-custom

 

# Install the elasticsearch plugin to push the logs to elasticsearch
sudo /usr/sbin/td-agent-gem install fluent-plugin-elasticsearch

 

# Create a new fluentd config file
cd /etc/td-agent/
mv td-agent.conf td-agent.conf.old
touch td-agent.conf

 

td-agent.conf content for the aggregator:

<source>
  @type gcloud_pubsub
  tag example.pull
  project [project-id]
  topic [topic-name]
  subscription [subscription-name]
  key /home/ubuntu/subscriber.json
  max_messages 1000
  return_immediately true
  pull_interval 2
  format json
</source>

<match example.pull>
  @type elasticsearch
      include_tag_key true
      host [elastic-search lb ip]
      port “9200”
      logstash_format true
      <buffer>
            chunk_limit_size 2M
            flush_thread_count 8
            flush_interval 5s
            retry_max_interval 30
            queue_limit_length 32
            retry_forever false
      </buffer>
</match>

 

Replace the following values:

[project-id] Project ID created in step 1
[topic-name] Topic created in step 3
[subscription-name] Subscription created in step 4
[elastic-search lb ip] with your load balancer IP

And remember to use the subscriber.json key created in step 6

 

#start the agent
service td-agent start

 

After this step is completed, you should now be able to start getting logs in your ElasticSearch installation.

 

Credits:
Written by : Diego Woitasen
English language corrections: Jesica Greco