Monitoring Internet service health on a cluster Linux gateways

Hi, I have an issue trying to monitoring the health and latency of Internet service from a cluster of Linux Gateways.

Only one of the linux servers has the public IP. An active-backup cluster.
First I have tried to monitor both GW using Icinga2 agent to run local nagios plugins scripts (check_icmp) but I found a problem. To do the job I have to point to a private floating IP, that is managed by Keepalived. And as a read, that’s not possible in this scenario.

So I have choose using SNMP to query a custom OID and then apply the service to the floating IP. So no matter which one has the public IP, Icinga2 Server will reach it.

That’s where the problem is. The output of the check_snmp seems to be ok, but Icinga2 server logs shows an error on perfdata and I cant graph the results.

Config on Icinga SNMP Agent

/etc/snmp/snmpd.conf
extend int_sal /usr/lib/nagios/plugins/check_icmp -s x.x.x.x -c 200,15% -w 100,5% -H google.com

Running mannualy on Icinga SNMP Agent

/usr/lib/nagios/plugins/check_icmp -s x.x.x.x -c 200,15% -w 100,5% -H google.com
OK - google.com: rta 3,224ms, lost 0%|rta=3,224ms;100,000;200,000;0; pl=0%;5;15;; rtmax=3,298ms;;;; rtmin=3,195ms;;;; 

Icinga Server Config

Service
object Service "Internet Saliente" {
    host_name = "host.example.com"
    check_command = "snmpv3"
    max_check_attempts = "3"
    check_interval = 3m
    retry_interval = 1m
    enable_notifications = true
    enable_active_checks = true
    enable_passive_checks = true
    enable_event_handler = true
    enable_flapping = true
    enable_perfdata = true
    vars.snmp_v3 = true
    vars.snmpv3_auth_alg = "md5"
    vars.snmpv3_auth_key = "Password"
    vars.snmpv3_oid = ".1.3.6.1.4.1.8072.1.3.2.3.1.2.14.105.110.116.95.115.97.108.95.99.108.97.114.111.50"
    vars.snmpv3_seclevel = "authNoPriv"
    vars.snmpv3_user = "User"
}

Icinga Server debug log

tail -f /var/log/icinga2/debug.log 
[2023-03-27 12:08:24 -0300] notice/Process: Running command '/usr/lib/nagios/plugins/check_snmp' '-A' 'Password' '-H' 'host_IP' '-L' 'authNoPriv' '-P' '3' '-U' 'User' '-a' 'md5' '-o' '.1.3.6.1.4.1.8072.1.3.2.3.1.2.14.105.110.116.95.115.97.108.95.99.108.97.114.111.50' '-t' '10' '-x' 'AES': PID 1682967

[2023-03-27 12:08:24 -0300] notice/Process: PID 1682967 ('/usr/lib/nagios/plugins/check_snmp' '-A' 'Password' '-H' 'host_IP' '-L' 'authNoPriv' '-P' '3' '-U' 'User' '-a' 'md5' '-o' '.1.3.6.1.4.1.8072.1.3.2.3.1.2.14.105.110.116.95.115.97.108.95.99.108.97.114.111.50' '-t' '10' '-x' 'AES') terminated with exit code 0


[2023-03-27 12:14:52 -0300] warning/GraphiteWriter: Ignoring invalid perfdata for checkable 'host.example.com!Internet Saliente' and command 'snmpv3' with value: rta=3,238ms;100,000;200,000;0;
[2023-03-27 12:14:52 -0300] debug/GraphiteWriter: Checkable 'host.example.com!Internet Saliente' adds to metric list: 'icinga2...host.example.com.services.Internet_Saliente.snmpv3.perfdata.pl.value 0 1679930092'.
[2023-03-27 12:14:52 -0300] debug/GraphiteWriter: Checkable 'host.example.com!Internet Saliente' adds to metric list: 'icinga2...host.example.com.services.Internet_Saliente.snmpv3.perfdata.pl.crit 15 1679930092'.
[2023-03-27 12:14:52 -0300] debug/GraphiteWriter: Checkable 'host.example.com!Internet Saliente' adds to metric list: 'icinga2...host.example.com.services.Internet_Saliente.snmpv3.perfdata.pl.warn 5 1679930092'.
[2023-03-27 12:14:52 -0300] warning/GraphiteWriter: Ignoring invalid perfdata for checkable 'host.example.com!Internet Saliente' and command 'snmpv3' with value: rtmax=3,272ms;;;;
[2023-03-27 12:14:52 -0300] warning/GraphiteWriter: Ignoring invalid perfdata for checkable 'host.example.com!Internet Saliente' and command 'snmpv3' with value: rtmin=3,213ms;;;;

Running the command manually from Icinga Server

'/usr/lib/nagios/plugins/check_snmp' '-A' 'Password' '-H' 'host_IP' '-L' 'authNoPriv' '-P' '3' '-U' 'User' '-a' 'md5' '-o' '.1.3.6.1.4.1.8072.1.3.2.3.1.2.14.105.110.116.95.115.97.108.95.99.108.97.114.111.50' '-t' '10' '-x' 'AES'
SNMP OK - "OK - google.com: rta 3,279ms, lost 0%|rta=3,279ms;100,000;200,000;0; pl=0%;5;15;; rtmax=3,345ms;;;; rtmin=3,240ms;;;; " |

I could not figure why Icinga2 cannot interpret the check_snmp output.
Thanks in advance!!!

Screenshot_20230327_122148

rta=3,238ms uses a comma rather than a full stop as fraction separator. That’s the problem.

1 Like

Thanks @Al2Klimov
But check_icmp is also using coma for decimals???

Do you know if there’s a workarround to monitor services just in one node on a HA cluster with floating IP?

what’s the version of check_icmp?
/usr/lib/nagios/plugins/check_icmp --version
I tried it with check_icmp v2.2 (monitoring-plugins 2.2) and the perf data decimal point is a dot, not a semicolon.

Can you update your monitoring-plugins?

Ok, I got it! Thanks @Al2Klimov for showing me the path…
The problem was effectively decimal separator… All my servers had it’s language environment set to Spanish… then, in Spanish we use comma for decimal separator and dot for thousand separator…

@moreamazingnick It was not a version issue, thanks anyway

# /usr/lib/nagios/plugins/check_icmp -V
check_icmp v2.3.1 (monitoring-plugins 2.3.1)

This is the actual Lang server’s configuration

# locale -k LC_NUMERIC
decimal_point=","
thousands_sep="."
grouping=3;3
numeric-decimal-point-wc=44
numeric-thousands-sep-wc=46
numeric-codeset="UTF-8"

Replace env variable

# export LC_NUMERIC="en_US.UTF-8"

Remote testing from Icinga Server

# /usr/lib/nagios/plugins/check_snmp -A xxx -H x.x.x.x -L authNoPriv -P 3 -U xxxxx -a MD5 -o .1.3.6.1.4.1.8072.1.3.2.3.1.2.14.105.110.116.95.115.97.108.95.99.108.97.114.111.50
SNMP OK - "OK - google.com: rta 3.206ms, lost 0%|rta=3.206ms;100.000;200.000;0; pl=0%;5;15;; rtmax=3.258ms;;;; rtmin=3.153ms;;;; " | 

Local testing on Icinga Agent

# /usr/lib/nagios/plugins/check_icmp -c 200,15% -w 100,5% -H google.com
OK - google.com: rta 0.806ms, lost 0%|rta=0.806ms;100.000;200.000;0; pl=0%;5;15;; rtmax=0.936ms;;;; rtmin=0.574ms;;;; 

Now Icinga server interpret correctly the plugin’s output
image

Making it persistent to en_US.UTF-8

# dpkg-reconfigure locales 

Thanks both for help!!!
Cheers!

1 Like