High InfluxDB disk usage after upgrade to 2.14.0

devopstt · December 12, 2023, 9:55am

Hi,

we are using Icinga with InfluxDB (1.8) feature in an environment with about 2000 Hosts and 30.000 service checks.
We upgraded Icinga2 from 2.13.4 to 2.14.0 two moths ago.

It looks like we have way more disk usage on our influxDB machines since upgrading to 2.14.0. Its the exact moment when the upgrade was installed

In the patch nodes I found the following change:

Influxdb(2)Writer: write more precise timestamps (nanoseconds). #9599

Do you also had way more disk usage? How can I avoid it?
We have a retention policy of 730d in InfluxDB

This is my influxdb feature configuration:

/**
 * The InfluxdbWriter type writes check result metrics and
 * performance data to an InfluxDB HTTP API
 */

object InfluxdbWriter "influxdb" {
  enable_ha = true
  host = "xxx.xxx.xxx.xxx"
  port = 9096
  database = "xxxxxxxxxxxxxxx"
  username = "xxxxxxxxxxxxxxx"
  password = "xxxxxxxxxxxxxxx"
  ssl_enable = true
  ssl_insecure_noverify = true
  enable_send_thresholds = true
  enable_send_metadata = true
  flush_threshold = 1024
  flush_interval = 10s
  host_template = {
    measurement = "$host.check_command$"
    tags = {
      hostname = "$host.name$"
      bereitschaft = "$host.vars.bereitschaft$"
      codename = "$host.vars.codename$"
      environment = "$host.vars.environment$"
      os = "$host.vars.os$"
      platform = "$host.vars.plattform$"
    }
  }
  service_template = {
    measurement = "$service.check_command$"
    tags = {
      hostname = "$host.name$"
      service = "$service.name$"
      bereitschaft = "$host.vars.bereitschaft$"
      codename = "$host.vars.codename$"
      environment = "$host.vars.environment$"
      os = "$host.vars.os$"
      platform = "$host.vars.plattform$"
    }
  }
}

Thank you!

mj84 · December 21, 2023, 11:25am

I recently upgraded my system to Icinga 2.14.0 as well and I’m not seeing any incline in the usage of my InfluxDB’s data partition (but we are using InfluxDB2).

I am a bit surprised about your host and service templates for your InfluxdbWriter though, since you put in alot of additional information in there.
Depending on the cardinality of your tag values, writing so many tags in the time series can definitely become a problem with storage overhead.

Since you started to see an increase of data usage after an Icinga2 update, my guess would be that the amount of metadata that is being sent to InfluxDB has increased, but can’t find anything in the changelogs regarding that.

Is there a specific reason for sending metadata into InfluxDB?

devopstt · December 21, 2023, 12:59pm

Yes, we need this kind of data for KPIs.

Perhaps it is worth taking a look at the migration to InfluxDB2.

jjuanino · December 23, 2023, 4:00pm

Hi all, only to say “me too” to the first post of Devopstt. Since I upgraded to 2.14.0 from 2.13.8 the disk usage of my 1.8 influxdb has increased a lot, about a 100%, and my setup is pretty similar to the Devopstt one. The overall performance has not been affected, only the disk usage. The only valid workaround has been to reduce the influxdb retention period.

Regards

devopstt · January 2, 2024, 1:10pm

Thank you for confirming!
I also created a github issue
https://github.com/Icinga/icinga2/issues/9948

Upgrading to InfluxDB2 seems not like an option because influx-relay is not compatible with it. And a non-redundant setup is nothing we want to have.

I wonder this is not an issue in more setups

henning · January 15, 2024, 7:04am

Hi everyone,
I can confirm this too after the upgrade from 1.8 to 2.14. We see this in our prod- and test-environment. Retention of the “icinga2”-bucket is 8736h0m0s (=364d), Shard group: 168h.

grafik

ShowMeYourSkil · January 18, 2024, 7:26pm

A hotfix for the issues was released today.

devopstt · January 19, 2024, 11:06am

Yep, already installed it. I will have a look during the day how the disk usage develops

devopstt · January 22, 2024, 7:50am

Way better