Heavy Disk Usage (I/O) with MariaDB

This is my first post here so hopefully I include all the necessary data to help get an answer.

I have Icinga 2 (2.13.2-1) running on a RPi with Buster.
I have the follow enabled features: api checker command ido-mysql influxdb mainlog notification
I have IcingaWeb 2.6.2 with only director as the extra module (and it’s dependants)
I have ~140 checks per 15min but the idomysqlconnection_ido-mysql_queries_15mins are nearly at 3,000.

Because I run this on a RPi and it’s install on an SD card I was concerned about the amount of I/O writes it does to the DB (~25G/day… nearly the whole SD card). I haven’t been able to figure out why such a large amount of writes to the DB or why so many more queries then checks.

ibdata1 is at 72MB and the two ib_logdata0/1 files are capped at 48MB each. But iotop shows a huge amount of data being constantly written with mysql.

When I stop icinga, mysql reduces to a ‘normal’ i/o load ~200K-500K/min instead of ~5-10M/min.

I guess my question is, is this normal performance for icinga? If so, how can I keep the queries in ram for longer and only write to disk less frequently. Icinga performs fine on the RPi and the load is minimal, it is just the ~25/GB a day writes that has me concerned about the longevity of my SD card.

Thanks for the help with this.

It may be helpful if you can provide the exact commands you are running and the direct output of them when you are checking on these stats. It is usually best to remove interpretation from the equation :slight_smile:

Thanks for the tips… Here are the commands and their outputs

/var/lib/mysql $ ls -lh ib*
-rw-rw---- 1 mysql mysql  48M Jan 22 17:44 ib_logfile0
-rw-rw---- 1 mysql mysql  48M Jan 22 13:51 ib_logfile1
-rw-rw---- 1 mysql mysql  76M Jan 22 17:44 ibdata1
sudo iostat -h
Linux 5.10.63-v7l+    01/22/22        _armv7l_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.6%    0.1%    2.0%    9.5%    0.0%   86.8%

      tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn Device
    14.58        30.2k       336.3k       2.2G      24.4G mmcblk0

sudo iotop -ao
  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
  894 be/4 mysql         0.00 B     51.91 M  0.00 %  0.66 % mysqld

This is after running it for 5mins.

I also ran

fatrace | grep 894

Resulting in 100’s of these a min:

mysqld(894): W /var/lib/mysql/ib_logfile0
mysqld(894): W /var/lib/mysql/ibdata1

I also logged ALL mariadb queries for about 20 mins and got A LOT of the following lines:

220122 16:30:18   1760 Query    COMMIT;BEGIN
220122 16:30:19   1760 Query    COMMIT;BEGIN
220122 16:30:20   1760 Query    COMMIT;BEGIN
220122 16:30:21   1760 Query    COMMIT;BEGIN
220122 16:30:22   1760 Query    COMMIT;BEGIN
 1760 Query    DELETE FROM icinga_programstatus WHERE instance_id = 1;INSERT INTO icinga_programstatus ............

I hope that helps clear up my interpretations of these outputs, and hopefully they’re accurate.

Thanks

Thanks. I guess you are not using files per table if you are writing to ibdata frequently. Can you run this?

show global variables like 'innodb_file_per_table';

How did you “logged ALL mariadb queries”?

Can you enable the general log for a short period of time and see if there are any similar queries being executed.

Here are the results…

MariaDB [(none)]> show global variables like 'innodb_file_per_table';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    |
+-----------------------+-------+

Yea I used general_log to log all the queries. Should I attach that file somewhere?

Thanks

That would be great if you could provide the general log file for some period of time. I’m not sure how much sensitive data may be there, so you may want to review it first.

It may also help to narrow down which tables are being written to by their growth in size (if any). I have to assume it is not growing by 25 GB a day, or you would have run out of space by now.

Maybe this shows something interesting? (Adjust as necessary):

for i in 1 2 3 4 5; do date; ls -lS /var/lib/mysql/icinga | head -5; sleep 60; done

ls -lt to sort by mtime as well

I’ve attached the log file from general_log. It ran for about 20mins. I did my best to take out sensitive data. Raspy.log (792.8 KB)

The database isn’t growing very much at all, what I would expect, maybe 1MB a day… thankfully!!

I ran the following as you suggested:

for i in 1 2 3 4 5; do date; ls -lS /var/lib/mysql/icinga | head -5; sleep 60; done
Sun Jan 23 15:07:52 PST 2022
total 33172
-rw-rw---- 1 mysql mysql 10485760 Jan 23 07:48 icinga_statehistory.ibd
-rw-rw---- 1 mysql mysql   491520 Jan 23 15:06 icinga_customvariables.ibd
-rw-rw---- 1 mysql mysql   475136 Jan 23 10:00 icinga_notifications.ibd
-rw-rw---- 1 mysql mysql   458752 Jan 23 15:06 icinga_customvariablestatus.ibd

Every repeat was exactly the same, size and datestamp

Using ls -lt however produced the following:

total 33172
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:11 icinga_programstatus.ibd
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:11 icinga_runtimevariables.ibd
-rw-rw---- 1 mysql mysql   442368 Jan 23 15:11 icinga_servicestatus.ibd
-rw-rw---- 1 mysql mysql   393216 Jan 23 15:09 icinga_hoststatus.ibd
Sun Jan 23 15:12:45 PST 2022
total 33172
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:12 icinga_programstatus.ibd
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:12 icinga_runtimevariables.ibd
-rw-rw---- 1 mysql mysql   442368 Jan 23 15:12 icinga_servicestatus.ibd
-rw-rw---- 1 mysql mysql   393216 Jan 23 15:09 icinga_hoststatus.ibd
Sun Jan 23 15:13:45 PST 2022
total 33172
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:13 icinga_runtimevariables.ibd
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:13 icinga_programstatus.ibd
-rw-rw---- 1 mysql mysql   442368 Jan 23 15:13 icinga_servicestatus.ibd
-rw-rw---- 1 mysql mysql   393216 Jan 23 15:09 icinga_hoststatus.ibd
Sun Jan 23 15:14:45 PST 2022
total 33172
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:14 icinga_runtimevariables.ibd
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:14 icinga_programstatus.ibd
-rw-rw---- 1 mysql mysql   393216 Jan 23 15:14 icinga_hoststatus.ibd
-rw-rw---- 1 mysql mysql   442368 Jan 23 15:14 icinga_servicestatus.ibd
Sun Jan 23 15:15:45 PST 2022
total 33172
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:15 icinga_programstatus.ibd
-rw-rw---- 1 mysql mysql   114688 Jan 23 15:15 icinga_runtimevariables.ibd
-rw-rw---- 1 mysql mysql   442368 Jan 23 15:15 icinga_servicestatus.ibd
-rw-rw---- 1 mysql mysql   393216 Jan 23 15:14 icinga_hoststatus.ibd 

Seems like these are also being rewritten constantly, although I didn’t see them while watching fatrace.

I have temporarily disabled ido-mysql feature and that has removed the heavy I/O usage. It is still notifying me when there are problems and influxdb is still working collecting data. I’m still able to use Icingaweb2 for some tasks (I haven’t really checked the full scope of what I can and cannot do with ido-mysql disabled).

Would love to still use ido-mysql if I can get the writes down to a reasonable amount.

Thanks

Thanks for providing that! The frequency of COMMIT is strange to me, and could be why the modified times keep getting updated and why the write amount being reported is so large. I don’t see that - maybe a AUTOCOMMIT difference?

I can say my programstatus and runtimevariables tables are not written to nearly as frequently as yours seem to be (hours). The service/host status are - but that’s to be expected.

Are you using Multiple Masters?. Can you check the contents of icinga_programstatus to see if it has more than one row?

Something odd may be going on with runtimevariables as well, maybe check out:

select * from icinga_runtimevariables\G

I’m guessing the COMMIT and AUTOCOMMIT are programmed inside Icinga right? I haven’t seen any reference to them in my reading.

I don’t think I’m running multiple masters. I do have two agents set up and they seem to connect and work great.

Here are the results of your requests:
icinga_programstatus:

MariaDB [icinga2]> select * from icinga_programstatus\G;
*************************** 1. row ***************************
              programstatus_id: 313917
                   instance_id: 1
               program_version: r2.13.2-1
            status_update_time: 2022-01-23 15:23:13
            program_start_time: 2022-01-23 15:05:54
              program_end_time: 2022-01-23 15:23:20
                 endpoint_name: Raspy
          is_currently_running: 0
                    process_id: NULL
                   daemon_mode: 1
            last_command_check: 2022-01-23 15:23:13
             last_log_rotation: NULL
         notifications_enabled: 1
     disable_notif_expire_time: NULL
 active_service_checks_enabled: 1
passive_service_checks_enabled: 1
    active_host_checks_enabled: 1
   passive_host_checks_enabled: 1
        event_handlers_enabled: 1
        flap_detection_enabled: 1
    failure_prediction_enabled: 0
      process_performance_data: 1
             obsess_over_hosts: 0
          obsess_over_services: 0
      modified_host_attributes: 0
   modified_service_attributes: 0
     global_host_event_handler: NULL
  global_service_event_handler: NULL
       config_dump_in_progress: 0
            endpoint_object_id: NULL
1 row in set (0.000 sec)

And icinga_runtimevariables

MariaDB [icinga2]> select * from icinga_runtimevariables\G
*************************** 1. row ***************************
runtimevariable_id: 1255653
       instance_id: 1
           varname: total_services
          varvalue: 53
endpoint_object_id: NULL
*************************** 2. row ***************************
runtimevariable_id: 1255654
       instance_id: 1
           varname: total_scheduled_services
          varvalue: 53
endpoint_object_id: NULL
*************************** 3. row ***************************
runtimevariable_id: 1255655
       instance_id: 1
           varname: total_hosts
          varvalue: 6
endpoint_object_id: NULL
*************************** 4. row ***************************
runtimevariable_id: 1255656
       instance_id: 1
           varname: total_scheduled_hosts
          varvalue: 6
endpoint_object_id: NULL
4 rows in set (0.001 sec)

Is it normal to have ~20 queries per service check? Icinga reports 133 active service checks for 15min and ~2600 ido-mysql_queries for 15min. These numbers are observed in the check_icinga perf data.

It’s a strange one… I’ve looked at ways to simply move the database to ram using buffers or other mariadb options, but nothing seems to actually help reduce the reported writes by both iotop and iostat.

Thanks again for wrestling through this with me.

No problem! I wish I had a simple answer, and thanks for providing this info…maybe someone else will use it to think of something I haven’t.

I have much lower queries-to-checks ratio than you are seeing. I will say I’m not using influxdb…maybe could you disable that and re-enable ido-mysql briefly?

active_service_checks_15min	                1,397.00
idomysqlconnection_ido-mysql_queries_15mins	5,783.00

Yea I’m not sure what is going on, but feel it may be an issue with MariaDB. After doing a lot of reading it seems others are having similar issues with large writes to the HDD.

I disabled ido-mysql feature, but still had icinga-director as a service running and found it was producing the same large write results. I’ve disabled icinga-director and disabled ido-mysql and it has removed the heavy write issue. In comparison, influxdb has used about 50Mb of writes where ido-mysql would be up to about 10GB.

I did disabled influx and run ido solely, but that didn’t change anything.

How about using the icingadb? I’ve heard mixed things about it, does it work in with icingaweb2 and how would I go about that?

It’s certainly not an issue with MariaDB as it will not run queries on its own against the Icinga2 database. I have not seen this myself, so I am trying to learn more about it to troubleshoot it. The measurement of writes (10 GB) is off to me, so I’m not sure that’s useful. I haven’t used icingadb.

I just wonder if mariadb is rewriting major parts of the table file when a line is deleted or updated? Watching iotop there are moments where it jumps 4-5MB in the accumulated write field.

The measurement of 10GB is calculating what iotop -aoP is reporting for mysql with a couple of hours of monitoring it.

My temp solution… move mariadb to tmpfs and rsync it once a day or so.

Expanding @eternalliving’s mention of tmpfs into a quick and dirty homelab workaround:

Setup:

/etc/init.d/mysql stop
mv /var/lib/mysql /var/lib/mysql.ondisk

Script at /usr/local/ramdb.sh (remember to chmod +x it):

#!/bin/bash

if [ "$(mount | grep mysql)" != "" ] ; then
      echo "[>] Backing up database from memory to disk"
      rsync -varP /var/lib/mysql/ /var/lib/mysql-ondisk/
      exit
fi      
echo "[>] Restoring database from disk to memory"
/etc/init.d/mysql stop
echo "[>] Creating ramdisk"
mount -t tmpfs tmpfs /var/lib/mysql -o nodev,size=1G
echo "[>] Copying MySQL directory to ramdisk"
rsync -varP /var/lib/mysql-ondisk/ /var/lib/mysql

echo "[>] Starting MySQL"
/etc/init.d/mysql start

Added to crontab:

@hourly root /usr/local/bin/ramdb.sh
@reboot root /usr/local/bin/ramdb.sh

Works in a LXC container on Proxmox.
Obviously this method carries its risks, and is not suitable for a production environment unless you really know what you are doing and have daily backups in case the database “save” rsync is inconsistent/corrupt.