Check_mysql with mysql_ignore_auth still fails with authentication error

HindrikDeelstra · June 14, 2021, 12:22pm

We’re monitoring many servers with Icinga(Web)2, but 1 single host fails the mysql check with error:

Access denied for user 'icinga'@'localhost'

The relevant snippets of config are:

apply Service "mysqld" {
  import "generic-service"
  check_command = "mysql"
  command_endpoint = host.vars.agent_endpoint
  vars += { mysql_hostname= "localhost", mysql_ignore_auth = true }
  assign where host.vars.agent_endpoint
}

As you can see, the check is performed by the Icinga2 agent on the host (“agent_endpoint” hostvar is set), with vars mysql_hostname = “localhost” and mysql_ignore_auth = true. This is also clear from the Service Object:

  <snip>
  * vars
    % = modified in '/etc/icinga2/conf.d/services.conf', lines 130:3-130:67
    * mysql_hostname = "localhost"
    * mysql_ignore_auth = true

This works like a charm for all other hosts in our inventory which need mysql checking, so I think the setup is sound. It must be something on the specific host, but I’m struggling to determine what exactly.

If I log into the host (CentOS Linux 7.9.2009 with Plesk Obsidian 18.0.35.2), I can perform the following without any issues (perform check_mysql as the icinga user, with the correct service variables):

# su - icinga -s /bin/bash -c "/usr/lib64/nagios/plugins/check_mysql -H localhost -n"
MySQL OK - Version: 10.5.10-MariaDB (protocol 10)

This should correspond exactly with the Service vars as executed by the agent on this host (and all other hosts). But the check result still fails with the error mentioned at the start.

I wonder if anyone can offer some more hints as to why this may happen, and possibly some tips to (find) a solution.

With kind regards,

Hindrik

HindrikDeelstra · July 6, 2021, 6:28am

Is there really no one with a ringing bell, able to point in any direction?

For now, I’ve had to resort to a work-around by explicitly defining a different mysql CheckCommand for this host, merely a TCP check to the Host to port 3306.

I’d like to perform the Endpoint mysql CheckCommand as defined above, if at all possible. Any hints/tips would be appreciated…

With kind regards,

Hindrik

Al2Klimov · July 12, 2021, 5:21pm

Hello @HindrikDeelstra!

Does your TCP check work in contrast to the socket one? How did you set the permissions for it on MySQL side?

Best,
AK

HindrikDeelstra · July 13, 2021, 6:51am

Hello AK,

The TCP check works fine, it just pings TCP port 3306, which is open for the Icinga2 monitoring server.

Like I mentioned, the check works when performed locally from the shell:

# su - icinga -s /bin/bash -c "/usr/lib64/nagios/plugins/check_mysql -H localhost -n"
MySQL OK - Version: 10.5.10-MariaDB (protocol 10)

Your question about the permissions is not clear to me, what do you mean exactly? I have not needed to setup anything with regards to permissions on all other hosts, and the check is performed with the “-n” flag to skip/ignore authentication and to just check connectivity…That works on all other hosts, but not this specific one, that is my conundrum here

Al2Klimov · July 13, 2021, 9:07am

Is this the exact command Icinga 2 actually runs? (see debug logs)

HindrikDeelstra · July 13, 2021, 9:44am

Hey AK,

I’ve changed the CheckCommand back to the Endpoint definition, and enabled the debug.log on the Endpoint host. And indeed, the command seems to be run just as is defined:

# grep check_mysql /var/log/icinga2/debug.log
[2021-07-13 11:28:49 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_mysql' '-H' 'localhost' '-n': PID 68969
[2021-07-13 11:28:49 +0200] notice/Process: PID 68969 ('/usr/lib64/nagios/plugins/check_mysql' '-H' 'localhost' '-n') terminated with exit code 2

The “terminated with exit code 2” clearly stands out, but I don’t have any clues as to why it exits with exit code 2 when executed via the agent, and exit code 0 when executed via “sudo” as the Icinga user on the shell. Although SELinux is enabled, there are no relevant log-entries in the audit.log. And even with SElinux in Permissive mode, it still fails the same way. So I think I can safely conclude SElinux is not a determining factor here…

It is not a really huge issue by any means, our work-around for this is simple and effective. We still have an idea about the status of the SQL server on this host, but I find this type of (at first sight) inexplicable behaviour interesting, and frustrating at the same time I would like to find a cause, and solution, but not at all costs.

With kind regards,

Hindrik