Icinga 2.11 released

Announcement

Blog post: https://icinga.com/2019/09/19/icinga-2-11/
Twitter Release Feed: https://twitter.com/icinga/status/1174311275234504704

Bugfix Releases

Changes

Issue and PRs

Notes

Upgrading docs: https://icinga.com/docs/icinga2/snapshot/doc/16-upgrading-icinga-2/

Thanks to all contributors: Obihoernchen, dasJ, sebastic, waja, BarbUk, alanlitster, mcktr, KAMI911, peteeckel, breml, episodeiv, Crited, robert-scheck, west0rmann, Napsty, Elias481, uubk, miso231, neubi4, atj, mvanduren-itisit, jschanz, MaBauMeBad, markleary, leeclemens, m4k5ym

Enhancements

  • Core
    • Rewrite Network Stack (cluster, REST API) based on Boost Asio, Beast, Coroutines
      • Technical concept: #7041
      • Requires package updates: Boost >1.66 (either from packages.icinga.com, EPEL or backports). SLES11 & Ubuntu 14 are EOL.
      • Require TLS 1.2 and harden default cipher list
    • Improved Reload Handling (umbrella process, now 3 processes at runtime)
      • Support running Icinga 2 in (Docker) containers natively in foreground
    • Quality: Use Modern JSON for C++ library instead of YAJL (dead project)
    • Quality: Improve handling of invalid UTF8 strings
  • API
    • Fix crashes on Linux, Unix and Windows from Nessus scans #7431
    • Locks and stalled waits are fixed with the core rewrite in #7071
    • schedule-downtime action supports all_services for host downtimes
    • Improve storage handling for runtime created objects in the _api package
  • Cluster
    • HA aware features & improvements for failover handling #2941 #7062
    • Improve cluster config sync with staging #6716
    • Fixed that same downtime/comment objects would be synced again in a cluster loop #7198
  • Checks & Notifications
    • Ensure that notifications during a restart are sent
    • Immediately notify about a problem after leaving a downtime and still NOT-OK
    • Improve reload handling and wait for features/metrics
    • Store notification command results and sync them in HA enabled zones #6722
  • DSL/Configuration
    • Add getenv() function
    • Fix TimePeriod range support over midnight
    • concurrent_checks in the Checker feature has no effect, use the global MaxConcurrentChecks constant instead
  • CLI
    • Permissions: node wizard/setup, feature, api setup now run in the Icinga user context, not root
    • ca list shows pending CSRs by default, ca remove/restore allow to delete signing requests
  • ITL
    • Add new commands and missing attributes
  • Windows
    • Update bundled NSClient++ to 0.5.2.39
    • Refine agent setup wizard & update requirements to .NET 4.6
  • Documentation
    • Service Monitoring: How to create plugins by example, check commands and a modern version of the supported plugin API with best practices
    • Features: Better structure on metrics, and supported features
    • Technical Concepts: TLS Network IO, Cluster Feature HA, Cluster Config Sync
    • Development: Rewritten for better debugging and development experience for contributors including a style guide. Add nightly build setup instructions.
    • Packaging: INSTALL.md was integrated into the Development chapter, being available at https://icinga.com/docs too.
11 Likes

I’ll add some thoughts and findings here. I will not repeat what’s written in the Changelog nor upgrading docs, read them prior to upgrading.

Cluster Config Sync

This has been made more robust with 2.11. Some changes may now unveil wrong or broken configuration which “somehow” worked before, but not all the time.

Read about the technical design here: https://github.com/Icinga/icinga2/issues/6716

Stage before Prod

As you have read in the upgrading docs, there’s a stage before the production directory, called zones-stage. The received cluster message JSON is extracted and the config files are put in there first.

The old buggy timestamp and file content comparison for detecting config changes has been moved into a checksum calculated version. This for example ensures that NTP time sync issues do not cause nodes to always restart.

For upgrading purposes, we recommend to do the masters first, then satellites, then agents. The other way around has been tested, but may unveil edge cases where it doesn’t work.

In order to troubleshoot this, follow this new docs entry: https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#new-configuration-does-not-trigger-a-reload

Binary Scripts in the Config Sync

Never supported, we know that some users kept doing it. With 2.11, we strictly forbid this on the master as the config sync with checksums would always result in a loop.

Details: https://github.com/Icinga/icinga2/issues/7382
Troubleshooting: https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#syncing-binary-files-is-denied

Syncing Zones in Zones

If you are using the Director, you may have attempted to manage zones in there, and not in your local zones.conf file during master/satellite setup.

One thing you must not configure inside the Director itself, are master/satellite and global zones. They need to exist on the agent before any config sync happens. Otherwise you have a chicken egg problem.

The config compiler is now only including directories where a zone has been configured. Otherwise it would include renamed old zones, broken zones, etc. which is not wanted.

Solve this with putting the required parts into zones.conf on the agent: Icinga Update 2.11; Director Commands not available

Similar chicken egg problem with satellites syncing the agent zones, and the agents syncing their own local config (not command_endpoint): https://github.com/Icinga/icinga2/issues/7519

Example

A more concrete example: Masters and Satellites still need to know the Zone hierarchy outside of zones.d synced configuration.

Doesn’t work
vim /etc/icinga2/zones.conf

object Zone "master" {
  endpoints = [ "icinga2-master1.localdomain", "icinga2-master2.localdomain" ]
}
vim /etc/icinga2/zones.d/master/satellite-zones.conf

object Zone "satellite" {
  endpoints = [ "icinga2-satellite1.localdomain", "icinga2-satellite1.localdomain" ]
}
vim /etc/icinga2/zones.d/satellite/satellite-hosts.conf

object Host "agent" { ... }

The agent host object will never reach the satellite, since the master does not have the satellite zone configured outside of zones.d.

Works

Each instance needs to know this, and know about the endpoints first:

vim /etc/icinga2/zones.conf

object Endpoint "icinga2-master1.localdomain" { ... }
object Endpoint "icinga2-master2.localdomain" { ... }

object Endpoint "icinga2-satellite1.localdomain" { ... }
object Endpoint "icinga2-satellite2.localdomain" { ... }

Then the zone hierarchy as trust and also config sync inclusion is required.

vim /etc/icinga2/zones.conf

object Zone "master" {
  endpoints = [ "icinga2-master1.localdomain", "icinga2-master2.localdomain" ]
}

object Zone "satellite" {
  endpoints = [ "icinga2-satellite1.localdomain", "icinga2-satellite1.localdomain" ]
}

Once done, you can start deploying actual monitoring objects into the satellite zone.

vim /etc/icinga2/zones.d/satellite/satellite-hosts.conf

object Host "agent" { ... }

That’s also explained and described in the documentation: https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#three-levels-with-masters-satellites-and-agents

The thing you can do: For command_endpoint agents like inside the Director: Host -> Agent -> yes, there is no config sync for this zone in place. Therefore it is valid to just sync their zones via the config sync.

TL;DR - with using the Director, its cluster zones and agent hosts, you are safe. Manage the master/satellite instances outside in zones.conf and import them via kickstart wizard.

Config Compiler

If you put a host/service object into conf.d or anywhere else, and use command_endpoint, Icinga sometimes refused to check and update its object authority. In order to ensure that these checks run, the config compiler now breaks on objects which do not have the zone attribute specified. Apparently the log message is just "command endpoint must not be set` which needs to be improved.

In order to fix this, ensure that remote command endpoint checks are only defined in zones.d/<zonename>. If you only have a single master, use its zone directory.

Discussion: https://github.com/Icinga/icinga2/issues/7514

1 Like

Moved the above into troubleshooting and upgrading docs.

Boost Repositories

https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/#added-boost-166

Zone in Zone

https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/#config-sync-zones-in-zones
https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#zones-in-zones-doesnt-work

Command endpoint with Zone

https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/#agent-hosts-with-command-endpoint-require-a-zone
https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#agent-hosts-with-command-endpoint-require-a-zone

Hello Michael will Icinga 2.11 be available for SLES 12 SP3? Currently I see only a build for SLES 12 SP4.

General support ended in June, and our CI containers only target the current supported version. See https://www.suse.com/lifecycle/

1 Like