Icinga2 Version 2.13.2-1
System CentOS 7.7.1908
Icinga Web 2 Version 2.10.1
PHP Version 7.3.29
icinga/icinga-php-library 0.8.1
icinga/icinga-php-thirdparty 0.10.0
Loaded Modules: doc/map/monitoring
Setup 2X Masters in HA configuration + 50+ satellites
roughly 9k Hosts 90K services
Dedicated Hosts for Icinga Web 2 and PGSQL 9.2.24
We’ve recently upgraded from Icinga 2.12 → 2.13 and Icinga Web 2 2.8 → 2.10 and added the maps module and business process module (currently disabled)
Our postgresql database has grown more than 12GB in the last few days from ~20GB to ~45GB, additionally, we’ve had all our hosts and service lapse into overdue status. Logs show an increase in pending queries on IdoPgsqlConnection:
information/IdoPgsqlConnection: Pending queries: 382355 (Input: 1434/s; Output: 1432/s)
information/IdoPgsqlConnection: Pending queries: 382533 (Input: 1428/s; Output: 1408/s)
information/IdoPgsqlConnection: Pending queries: 382691 (Input: 1420/s; Output: 1404/s)
information/IdoPgsqlConnection: Pending queries: 383250 (Input: 1450/s; Output: 1408/s)
information/IdoPgsqlConnection: Pending queries: 383835 (Input: 1404/s; Output: 1348/s)
information/IdoPgsqlConnection: Pending queries: 384902 (Input: 1491/s; Output: 1401/s)
information/IdoPgsqlConnection: Pending queries: 385100 (Input: 1437/s; Output: 1414/s)
information/IdoPgsqlConnection: Pending queries: 385220 (Input: 1424/s; Output: 1412/s)
These queries grow until we restart the icinga2 service on the primary master.
We could not figure out what was causing the output to lag behind the input resulting in the Pending queries, I decided to bump up the shared_buggers, work_mem and maintenance_work_mem allocations in postgresql.conf, reloaded the postgresql service and restarted Icinga on the master host and the pending queries went as low as 5000 before steadily climbing again to roughly 17000 and rising in ~25 minutes
During troubleshooting, I also checked the queue on our second master and found the output to be much much lower than that of our first master I am unsure of why the performance of the second master is so poor.
information/IdoPgsqlConnection: PGSQL IDO instance id: 1 (schema version: '1.14.3')
information/IdoPgsqlConnection: Pending queries: 12255 (Input: 1202/s; Output: 26/s)
information/IdoPgsqlConnection: Pending queries: 24367 (Input: 1211/s; Output: 41/s)
information/IdoPgsqlConnection: Pending queries: 36538 (Input: 1211/s; Output: 41/s)
information/IdoPgsqlConnection: Pending queries: 48802 (Input: 1221/s; Output: 41/s)
information/IdoPgsqlConnection: Pending queries: 60777 (Input: 1175/s; Output: 28/s)
information/IdoPgsqlConnection: Pending queries: 72554 (Input: 1167/s; Output: 31/s)
information/IdoPgsqlConnection: Pending queries: 84508 (Input: 1184/s; Output: 32/s)
information/IdoPgsqlConnection: Pending queries: 96163 (Input: 1146/s; Output: 30/s)
information/IdoPgsqlConnection: Pending queries: 108317 (Input: 1205/s; Output: 31/s)
information/IdoPgsqlConnection: Pending queries: 120425 (Input: 1197/s; Output: 31/s)
information/IdoPgsqlConnection: Pending queries: 131593 (Input: 1148/s; Output: 78/s)
information/IdoPgsqlConnection: Pending queries: 142611 (Input: 1135/s; Output: 78/s)
information/IdoPgsqlConnection: Pending queries: 153818 (Input: 1152/s; Output: 79/s)
information/IdoPgsqlConnection: Pending queries: 165133 (Input: 1166/s; Output: 79/s)
information/IdoPgsqlConnection: Pending queries: 176207 (Input: 1144/s; Output: 78/s)
information/IdoPgsqlConnection: Pending queries: 187482 (Input: 1168/s; Output: 79/s)
Could any of these issues be related to these indexes that were marked as “unused since last statistics” by postgresqltuner?
icinga_commands.commands_i_id_idx icinga_configfiles.configfiles_i_id_idx icinga_configfilevariables.configfilevariables_i_id_idx icinga_contact_addresses.contact_addresses_i_id_idx icinga_contact_notificationcommands.contact_notifcommands_i_id_idx icinga_contactgroup_members.cntgrpmbrs_cgid_coid icinga_contactgroup_members.contactgroup_members_i_id_idx icinga_contactgroups.contactgroups_i_id_idx icinga_contactnotificationmethods.contact_notif_meth_notif_idx icinga_contacts.contacts_i_id_idx icinga_contactstatus.contactstatus_i_id_idx icinga_customvariables.customvariables_i_id_idx icinga_customvariables.icinga_customvariables_i icinga_customvariablestatus.customvariablestatus_i_id_idx icinga_customvariablestatus.icinga_customvariablestatus_i icinga_endpoints.idx_endpoints_zone_object_id icinga_endpointstatus.idx_endpointstatus_zone_object_id icinga_eventhandlers.eventhandlers_i_id_idx icinga_eventhandlers.eventhandlers_time_id_idx icinga_externalcommands.externalcommands_time_id_idx icinga_host_contactgroups.host_contactgroups_i_id_idx icinga_host_contacts.host_contacts_i_id_idx icinga_host_parenthosts.host_parenthosts_i_id_idx icinga_hostchecks.hostchecks_time_id_idx icinga_hostchecks.hostchks_h_obj_id_idx icinga_hostdependencies.hostdependencies_i_id_idx icinga_hostdependencies.idx_hostdependencies icinga_hostescalation_contactgroups.hostesc_cgroups_i_id_idx icinga_hostescalation_contacts.hostesc_contacts_i_id_idx icinga_hostescalations.hostesc_i_id_idx icinga_hostgroup_members.hostgroup_members_i_id_idx icinga_hostgroups.hostgroups_i_id_idx icinga_hosts.hosts_i_id_idx icinga_hoststatus.hoststatus_check_type_idx icinga_hoststatus.hoststatus_current_state_idx icinga_hoststatus.hoststatus_event_hdl_en_idx icinga_hoststatus.hoststatus_ex_time_idx icinga_hoststatus.hoststatus_flap_det_en_idx icinga_hoststatus.hoststatus_i_id_idx icinga_hoststatus.hoststatus_is_flapping_idx icinga_hoststatus.hoststatus_latency_idx icinga_hoststatus.hoststatus_p_state_chg_idx icinga_hoststatus.hoststatus_pas_chks_en_idx icinga_hoststatus.hoststatus_problem_ack_idx icinga_hoststatus.hoststatus_sch_downt_d_idx icinga_hoststatus.hoststatus_stat_upd_time_idx icinga_hoststatus.hoststatus_state_type_idx icinga_logentries.loge_time_idx icinga_runtimevariables.runtimevariables_i_id_idx icinga_scheduleddowntime.idx_downtimes_session_del icinga_service_contactgroups.service_contactgroups_i_id_idx icinga_service_contacts.service_contacts_i_id_idx icinga_servicechecks.servicechecks_time_id_idx icinga_servicechecks.servicechks_s_obj_id_idx icinga_servicedependencies.idx_servicedependencies icinga_serviceescalation_contactgroups.serviceesc_cgroups_i_id_idx icinga_serviceescalation_contacts.serviceesc_contacts_i_id_idx icinga_serviceescalations.serviceesc_i_id_idx icinga_servicegroup_members.servicegroup_members_i_id_idx icinga_servicegroups.servicegroups_i_id_idx icinga_services.services_i_id_idx icinga_servicestatus.srvcstatus_check_type_idx icinga_servicestatus.srvcstatus_event_hdl_en_idx icinga_servicestatus.srvcstatus_ex_time_idx icinga_servicestatus.srvcstatus_flap_det_en_idx icinga_servicestatus.srvcstatus_is_flapping_idx icinga_servicestatus.srvcstatus_latency_idx icinga_servicestatus.srvcstatus_p_state_chg_idx icinga_servicestatus.srvcstatus_pas_chks_en_idx icinga_servicestatus.srvcstatus_problem_ack_idx icinga_servicestatus.srvcstatus_sch_downt_d_idx icinga_servicestatus.srvcstatus_stat_upd_time_idx icinga_servicestatus.srvcstatus_state_type_idx icinga_statehistory.statehist_i_id_o_id_s_ty_s_ti icinga_systemcommands.systemcommands_i_id_idx icinga_systemcommands.systemcommands_time_id_idx icinga_timeperiod_timeranges.timeperiod_timeranges_i_id_idx icinga_timeperiods.timeperiods_i_id_idx icinga_zones.idx_zones_parent_object_id icinga_zonestatus.idx_zonestatus_parent_object_id
Does anyone have any ideas of what I could do next? We’ve cleared Icinga.state files and /var/lib/icinga2/api/log files prior to restarting icinga as that has worked in the past, but I am unsure of the consequences of doing so.