I learned a great lesson last week. If you reboot Linux machines with Fluentd installed on them you can potentially accidentally resend some of your logs to your matches. Allow me to ellaborate.
For those who don’t know, Fluentd is an open source cross platform data collection service. It’s easy to use and configure with a robust plugin functionality that allows for many supported services. Check it out here! I use Fluentd to help with log aggregation on AWS EC2 instances for both system logs and Docker container logs.
Fluentd has a number of different installation configurations. I ran an install setup for the Ruby gem in a systemd service that launches on boot on a Ubuntu 18.04 AWS EC2 instance. This system is configured with a fluent.conf file that defines a number of different sources with filters and a single outputted match.
One source I have is systemd similar to the following configuration listed in the Fluentd plugin documentation for systemd.
<source> @type systemd path /var/log/systemdService read_from_head true <storage> @type local path /tmp/systemdService.pos </storage> </source>
There are several interesting things about the defaults from the Fluentd plugin architecture. The source that we have defined uses the systemd plugin to pull logs from the path
/var/log/systemdService and to restart from the beginning of the available logs if the service is restarted without a log position file defined at
/tmp/systemdService.pos. The autosaving functionality of Fluentd defined in the plugin architecture makes this
/tmp/systemdService.pos file present as the service continues to read available logs and output them to defined matches.
This is where my confusion began when a match defined in the config started receiving duplicate systemd messages. It turned out that the instance running the Fluentd process was restarting at regular intervals. Normally this shouldn’t be a problem. Fluentd will reread the
/tmp/systemdService.pos file that is storing the position in the stream and go along it’s merry way. If that path isn’t present is when we begin reading from the beginning of the available file stream (as defined in the systemd plugin).
However, that was the problem! Systemd defines a service to be helpful called systemd-tmpfiles which by default in Ubuntu is enabled on boot with the
--clean option. This causes the temporary filesystem to be destroyed on boot. Normally this is a great default. You wouldn’t want extraneous files to be taking up all the memory on a volume, let alone potentially your root volume. In my case though our
.pos file was defined in a
/tmp directory which was cleaned by the
systemd-tmpfiles service. This caused the restart of the instance to remove the file and resend logs to our defined match.
The fix was to move my
.pos file outside of the
/tmp volume to allow that file to persistent after restarts. The
.pos file definition is minuscule since it is just the stream position of the journal that is being read from so I did not have to worry about disk usage problems.