High CPU Load for Windows Servers with icinga2 2.12.1

Hello,

Most of our windows 2012/2016/2019 servers with only 1 vcpu are constantly overloaded in cpu because of icinga2 process.

high_load

The logs on the client & satellites look fine, The config from director is pushed successfully to the agent at this time. (we even checked in debug.log)

[2020-11-25 10:19:57 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-centos-check_log-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:19:57 +0100] warning/ApiListener: Removing API client for endpoint ‘sat2.host.net’. 0 API clients left.
[2020-11-25 10:19:57 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-certificate-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:19:58 +0100] warning/ApiListener: Removing API client for endpoint ‘sat1.host.net’. 0 API clients left.
[2020-11-25 10:20:03 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_apache_status-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:08 +0100] information/ConfigObject: Dumping program state to file ‘C:\ProgramData\icinga2\var\lib\icinga2/icinga2.state’
[2020-11-25 10:20:08 +0100] information/ApiListener: Reconnecting to endpoint ‘sat1.host.net’ via host ‘x.x.x.1’ and port ‘5665’
[2020-11-25 10:20:09 +0100] information/ApiListener: Reconnecting to endpoint ‘sat2.host.net’ via host ‘x.x.x.2’ and port ‘5665’
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_apache_tomcat-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_cassandra_backup-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_certificate_ssl_file-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_dir-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_disk-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_dns-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_elasticsearch_backup-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_http-remote-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_http-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_iostat-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_load-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_log-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mariadb-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mariadb_backup-service-template.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mem-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_memcached-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mongodb-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mongodb_backup-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mysql-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_mysql_backup-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_nginx-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_ntp-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_oracle_backup-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_partition_readonly-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_ping_vpn-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_postgres-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_postgres_backup-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_procs-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_snmp-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:14 +0100] information/ApiListener: New client connection for identity ‘sat2.host.net’ to [x.x.x.2]:5665
[2020-11-25 10:20:14 +0100] information/ApiListener: New client connection for identity ‘sat1.host.net’ to [x.x.x.1]:5665
[2020-11-25 10:20:19 +0100] information/ApiListener: Stage: Updating received configuration file ‘C:\ProgramData\icinga2\var\lib\icinga2/api/zones-stage/global-templates//_etc/linux-check_snmp-v3-service-apply.conf’ for zone ‘global-templates’.
[2020-11-25 10:20:19 +0100] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint ‘sat1.host.net’.
[2020-11-25 10:20:19 +0100] information/ApiListener: Finished reconnecting to endpoint ‘sat2.host.net’ via host ‘x.x.x.2’ and port ‘5665’
[2020-11-25 10:20:19 +0100] information/ApiListener: Sending config updates for endpoint ‘sat1.host.net’ in zone ‘SAT-PAR1’.
[2020-11-25 10:20:19 +0100] information/ApiListener: Finished sending config file updates for endpoint ‘sat1.host.net’ in zone ‘SAT-PAR1’.
[2020-11-25 10:20:19 +0100] information/ApiListener: Syncing runtime objects to endpoint ‘sat1.host.net’.

To resolve temporarily

  • If we add another vcpu to the VM we no longer have the problem
  • If we set manually the Icinga 2 process priority to low it also fixes the problem
  • If we set the Icinga 2 process priority to low by a registry key, the cpu load remanis at 100% but if we set back the priority to normal and then to low manually it fixes the issue

Environment:

  • Icinga version 2.12.1-1
  • OS at the Windows Agents: W2012R2, W2016, W2019
  • PowerShell 5.1 or later
  • .NET Framework 4.7 or later

Thank’s for your help.

Hello @MohamedRedaLyoubi,
I have experienced this problem before with Windows clients. How are your zones configured? Are you using the Top Down configuration? Could you share your zones.conf file from your master and clients?

Regards
Alex

Hello @aclark6996

Yes we are using a Top Down configuration (master & Satellites)
Here is our config:

master1:

object Endpoint NodeName {
// That’s us
}

object Endpoint “master2.host.net” {
host = “x.x.x.1” // Actively connect to the second master. // we stopped the icinga2 service to leave only one master
log_duration = 1h
}

object Endpoint “sat1.host.net” {
host = “x.x.x.1” // Actively connect to the satellites.
}

object Endpoint “sat2.host.net” {
host = “x.x.x.2” // Actively connect to the satellites.
}

object Endpoint “sat21.host.net” {
host = “x.x.x.1” // Actively connect to the satellites.
}

object Endpoint “sat22.host.net” {
host = “x.x.x.2” // Actively connect to the satellites.
}

object Endpoint “sat31.host.net” {
host = “x.x.x.1” // Actively connect to the satellites.
}
object Endpoint “sat32.host.net” {
host = “x.x.x.2” // Actively connect to the satellites.
}

object Zone ZoneName {
endpoints = [ NodeName , “master2.host.net” ]
}

object Zone “SAT-PAR1” {
endpoints = [ “sat1.host.net” , “sat2.host.net” ]
parent = “master”
}

object Zone “SAT-P2” {
endpoints = [ “sat21.host.net” , “sat22.host.net” ]
parent = “master”
}

object Zone “SAT-L” {
endpoints = [ “sat31.host.net” , “sat32.host.net” ]
parent = “master”
}

/* sync global commands */
object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

sat1:

object Endpoint “master1.host.net” {
// This endpoint will connect to us
}

object Endpoint “master2.host.net” {
// This endpoint will connect to us
}

object Endpoint NodeName {
// That’s us
}

object Endpoint “sat2.host.net” {
host = “x.x.x.2” // Actively connect to the secondary satellite
}

object Zone “master” {
endpoints = [ “master1.host.net” , “master2.host.net” ]
}

object Zone ZoneName {
endpoints = [ NodeName , “sat2.host.net” ]
parent = “master”
}

/* sync global commands */
object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

sat2:

object Endpoint “master1.host.net” {
// This endpoint will connect to us
}

object Endpoint “master2.host.net” {
// This endpoint will connect to us
}

object Endpoint “sat1.host.net” {
// First satellite already connected to us
}

object Endpoint NodeName {
// That’s us
}

object Zone “master” {
endpoints = [ “master1.host.net” , “master2.host.net” ]
}

object Zone ZoneName {
endpoints = [ “sat1.host.net”, NodeName ]

parent = “master”
}

/* sync global commands */
object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

Client:

object Endpoint NodeName {
}

object Endpoint “sat1.host.net” {
host = “x.x.x.1”
}

object Endpoint “sat2.host.net” {
host = “x.x.x.2”
}

object Zone NodeName {
endpoints = [ NodeName, ]
parent = “SAT-PAR1”
}

object Zone “SAT-PAR1” {
endpoints = [ “sat1.host.net”, “sat2.host.net”, ]
}

object Zone “director-global” {
global = true
}

object Zone “global-templates” {
global = true
}

Regards
Reda

I see you already posted this on github as well, so I’m linking the issue here as well:

Not sure if anyone here can help any further though, as it is recognized as a bug and already got the devs feedback :slight_smile:

1 Like

Hello @MohamedRedaLyoubi,

Please try step 1 below. If that doesn’t work try step 2. This is how I have resolved this problem before on my Windows clients. I am on version 2.10.5. Maybe this is a bug in 2.12.

Step1
Try removing the host attribute from the Endpoint Object on your clients zones.conf file. You only need one connection path. Your satellite zones file says it is connecting to your clients. Your client zones files says it is connecting to your satellite. The path from your client to your satellite will get reject saying it is already connect but maybe it is causing additional load on the CPU.

Client

object Endpoint NodeName {
}

object Endpoint “sat1.host.net” {
}

object Endpoint “sat2.host.net” {
}

object Zone NodeName {
endpoints = [ NodeName, ]
parent = “SAT-PAR1”
}

object Zone “SAT-PAR1” {
endpoints = [ “sat1.host.net”, “sat2.host.net”, ]
}

Step 2

  1. Stop Icinga2 service
  2. Rename file “C:\ProgramData\icinga2\var\lib\icinga2\icinga.state” to “icinga.old” (this file will get recreated again on restart)
  3. Start Icinga2 service
1 Like

Hello @aclark6996,

We can’t try step 1 because the connection is established by satellites to clients only (and not in the other direction), we already have one connection path:

satellite 1:

object Endpoint “master1.host.net” {
// This endpoint will connect to us
}

object Endpoint “master2.host.net” {
// This endpoint will connect to us
}

object Endpoint NodeName {
// That’s us
}

object Endpoint “sat2.host.net” {
host = “x.x.x.2” // Actively connect to the secondary satellite
}

object Zone “master” {
endpoints = [ “master1.host.net” , “master2.host.net” ]
}

object Zone ZoneName {
endpoints = [ NodeName , “sat2.host.net” ]
parent = “master”
}

/* sync global commands */
object Zone “global-templates” {
global = true
}

object Zone “director-global” {
global = true
}

We will try step 2 if we still have the prolem.
Thank you, we really appreciate it.

@MohamedRedaLyoubi
In step one your connection path will stay the same (satellite down to client). In your zones.conf file on your client you have the IP address of the satellites. Remove this line because it is not needed and it is causing another connection request. This connection request is rejected by the satellite. This extra connection request could be causing extra CPU load on the Windows servers.

Regards
Alex