Hi everyone,
One year ago, we switch from Skinken/Thruk to Icinga2/IcingaWeb2 to manage our monitoring.
With more than 550 hosts and 12.000 services.
A majority of those services are NRPE checks. Script run on hosts and results directly transfert to the monitoring server via NRPE.
Example :
'/usr/lib/nagios/plugins/check_nrpe' '-4' '-H' '[ANONYMIZED_IP]' '-c' 'check_name' '-t' '10'
We use a distributed monitoring system. With two masters and one satellite. No agents on hosts.
Since then we encounter a problem, not seen before on our old system.
From time to time NRPE checks (not any other) come back critical with this error :
CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
And 1 to 3 minutes after (next check period) status goes back OK.
This error occurs from both masters and satellite, on different hosts and services, but not all. No link between those host or services found on our end.
NRPE conf files on hosts are correct. And nothing to be find in the network as far as we can see.
Any ideas where to look to understand this kind of behavior ?
-
Version used (
icinga2 --version
)icinga2 - The Icinga 2 network monitoring daemon (version: r2.12.3-1)
Copyright (c) 2012-2021 Icinga GmbH (https://icinga.com/) License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. System information: Platform: Debian GNU/Linux Platform version: 10 (buster) Kernel: Linux Kernel version: 4.19.0-0.bpo.6-amd64 Architecture: x86_64 Build information: Compiler: GNU 8.3.0 Build host: runner-hh8q3bz2-project-298-concurrent-0 OpenSSL version: OpenSSL 1.1.1d 10 Sep 2019 Application information: General paths: Config directory: /etc/icinga2 Data directory: /var/lib/icinga2 Log directory: /var/log/icinga2 Cache directory: /var/cache/icinga2 Spool directory: /var/spool/icinga2 Run directory: /run/icinga2 Old paths (deprecated): Installation root: /usr Sysconf directory: /etc Run directory (base): /run Local state directory: /var Internal paths: Package data directory: /usr/share/icinga2 State path: /var/lib/icinga2/icinga2.state Modified attributes path: /var/lib/icinga2/modified-attributes.conf Objects path: /var/cache/icinga2/icinga2.debug Vars path: /var/cache/icinga2/icinga2.vars PID path: /run/icinga2/icinga2.pid
-
Operating System and version
Distributor ID: Debian Description: Debian GNU/Linux 10 (buster) Release: 10 Codename: buster
-
Enabled features (
icinga2 feature list
)Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb influxdb opentsdb perfdata statusdata syslog Enabled features: api checker command ido-mysql livestatus mainlog notification
-
Config validation (
icinga2 daemon -C
)[2021-04-07 15:59:04 +0200] information/cli: Icinga application loader (version: r2.12.3-1) [2021-04-07 15:59:04 +0200] information/cli: Loading configuration file(s). [2021-04-07 15:59:05 +0200] information/ConfigItem: Committing config item(s). [2021-04-07 15:59:05 +0200] information/ApiListener: My API identity: [ANONYMISED].net [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 NotificationComponent. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 527 Hosts. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 231 Downtimes. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 6 NotificationCommands. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 FileLogger. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 10 Comments. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 12218 Notifications. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 IcingaApplication. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 55 HostGroups. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 CheckerComponent. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 5 Zones. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 3 Endpoints. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 ExternalCommandListener. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 IdoMysqlConnection. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 4 ApiUsers. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 ApiListener. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 292 CheckCommands. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 1 LivestatusListener. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 10 TimePeriods. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 8 UserGroups. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 10 Users. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 11476 Services. [2021-04-07 15:59:11 +0200] information/ConfigItem: Instantiated 24 ServiceGroups. [2021-04-07 15:59:11 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars' [2021-04-07 15:59:11 +0200] information/cli: Finished validating the configuration file(s).
-
If you run multiple Icinga 2 instances, the
zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodesObject '[ANONYMIZED_SATELLITE]' of type 'Endpoint': % declared in '/etc/icinga2/zones.conf', lines 22:1-22:46 * __name = "[ANONYMIZED_SATELLITE]" * host = "[ANONYMIZED_IP_SATELLITE]" % = modified in '/etc/icinga2/zones.conf', lines 23:2-23:24 * log_duration = 86400 * name = "[ANONYMIZED_SATELLITE]" * package = "_etc" * port = "5665" % = modified in '/etc/icinga2/zones.conf', lines 24:2-24:14 * source_location * first_column = 1 * first_line = 22 * last_column = 46 * last_line = 22 * path = "/etc/icinga2/zones.conf" * templates = [ "[ANONYMIZED_SATELLITE]" ] % = modified in '/etc/icinga2/zones.conf', lines 22:1-22:46 * type = "Endpoint" * zone = "" Object '[ANONYMIZED_MASTER2]' of type 'Endpoint': % declared in '/etc/icinga2/zones.conf', lines 15:1-15:39 * __name = "[ANONYMIZED_MASTER2]" * host = "" * log_duration = 86400 * name = "[ANONYMIZED_MASTER2]" * package = "_etc" * port = "5665" * source_location * first_column = 1 * first_line = 15 * last_column = 39 * last_line = 15 * path = "/etc/icinga2/zones.conf" * templates = [ "[ANONYMIZED_MASTER2]" ] % = modified in '/etc/icinga2/zones.conf', lines 15:1-15:39 * type = "Endpoint" * zone = "" Object '[ANONYMIZED_MASTER1]' of type 'Endpoint': % declared in '/etc/icinga2/zones.conf', lines 6:1-6:36 * __name = "[ANONYMIZED_MASTER1]" * host = "[ANONYMIZED_MASTER1]" % = modified in '/etc/icinga2/zones.conf', lines 7:4-7:18 * log_duration = 86400 * name = "[ANONYMIZED_MASTER1]" * package = "_etc" * port = "5665" * source_location * first_column = 1 * first_line = 6 * last_column = 36 * last_line = 6 * path = "/etc/icinga2/zones.conf" * templates = [ "[ANONYMIZED_MASTER1]" ] % = modified in '/etc/icinga2/zones.conf', lines 6:1-6:36 * type = "Endpoint" * zone = "" Object 'global-commands' of type 'Zone': % declared in '/etc/icinga2/zones.conf', lines 41:1-41:29 * __name = "global-commands" * endpoints = null * global = true % = modified in '/etc/icinga2/zones.conf', lines 42:3-42:15 * name = "global-commands" * package = "_etc" * parent = "" * source_location * first_column = 1 * first_line = 41 * last_column = 29 * last_line = 41 * path = "/etc/icinga2/zones.conf" * templates = [ "global-commands" ] % = modified in '/etc/icinga2/zones.conf', lines 41:1-41:29 * type = "Zone" * zone = "" Object 'interne' of type 'Zone': % declared in '/etc/icinga2/zones.conf', lines 27:1-27:21 * __name = "interne" * endpoints = [ "[ANONYMIZED_SATELLITE]" ] % = modified in '/etc/icinga2/zones.conf', lines 28:2-28:47 * global = false * name = "interne" * package = "_etc" * parent = "master" % = modified in '/etc/icinga2/zones.conf', lines 29:2-29:18 * source_location * first_column = 1 * first_line = 27 * last_column = 21 * last_line = 27 * path = "/etc/icinga2/zones.conf" * templates = [ "interne" ] % = modified in '/etc/icinga2/zones.conf', lines 27:1-27:21 * type = "Zone" * zone = "" Object 'global-templates' of type 'Zone': % declared in '/etc/icinga2/zones.conf', lines 33:1-33:30 * __name = "global-templates" * endpoints = null * global = true % = modified in '/etc/icinga2/zones.conf', lines 34:2-34:14 * name = "global-templates" * package = "_etc" * parent = "" * source_location * first_column = 1 * first_line = 33 * last_column = 30 * last_line = 33 * path = "/etc/icinga2/zones.conf" * templates = [ "global-templates" ] % = modified in '/etc/icinga2/zones.conf', lines 33:1-33:30 * type = "Zone" * zone = "" Object 'director-global' of type 'Zone': % declared in '/etc/icinga2/zones.conf', lines 37:1-37:29 * __name = "director-global" * endpoints = null * global = true % = modified in '/etc/icinga2/zones.conf', lines 38:2-38:14 * name = "director-global" * package = "_etc" * parent = "" * source_location * first_column = 1 * first_line = 37 * last_column = 29 * last_line = 37 * path = "/etc/icinga2/zones.conf" * templates = [ "director-global" ] % = modified in '/etc/icinga2/zones.conf', lines 37:1-37:29 * type = "Zone" * zone = "" Object 'master' of type 'Zone': % declared in '/etc/icinga2/zones.conf', lines 18:1-18:20 * __name = "master" * endpoints = [ "[ANONYMIZED_MASTER1]", "[ANONYMIZED_MASTER2]" ] % = modified in '/etc/icinga2/zones.conf', lines 19:2-19:62 * global = false * name = "master" * package = "_etc" * parent = "" * source_location * first_column = 1 * first_line = 18 * last_column = 20 * last_line = 18 * path = "/etc/icinga2/zones.conf" * templates = [ "master" ] % = modified in '/etc/icinga2/zones.conf', lines 18:1-18:20 * type = "Zone" * zone = ""
-
NRPE plugin
NRPE Plugin for Nagios Version: 4.0.3