Icinga vs. Grafana

This is from a discussion on MP.

Grafana is a frontend for time series databases. It provides capabilities to define alerts and annotations which provide sort of “light weight monitoring”. It does not replace a running daemon which regularly pulls in state and metrics. You’ll need a TSDB as backend, which is populated by other tools at least.

Kibana is a frontend on top of Elasticsearch inside the Elastic Stack. You can view, correlate and analyse data stored in Elasticsearch. Still, you need tools which write to the backend. Logstash and the Beats are able to do that. To my knowledge, Elastic is also going the route to send alerts to users. Still, it based on a document storage and does not replace a monitoring core.

Both of these tools attempt to provide monitoring on their own, in their “stack”. If you somehow manage to push your data into the backend, additional alert rules may be defined.

collectd is a metric collector, which supports to write metrics into Graphite as TSDB. I don’t know if it supports more backends.

That’s a similar thing what Icinga 2 does. Things these tools cannot provide is a reliable secured cluster and distributed monitoring stack, object dependencies and other things.

Icinga goes the route of integration here, providing the features to write to Graphite, InfluxDB, etc. as a timeseries database. Collected performance data metrics from plugin output/states. Icinga does the alerting and state visualization on its own. You can integrate e.g. Grafana graphs into your monitoring detail views to further analyse the problem - e.g. the host is down, what happened?

Similar thing with Elastic Stack and Graylog, where Icinga 2 (or Icingabeat using the API streams) sends the check output, metrics, etc. to the backend. Frontend integration can happen again with Icinga Web 2 modules, having the host’s log entries in your detail view pane. That’s tremendously helpful if you cannot SSH into a host to analyse logs, because it is down already.

TL;DR - I don’t think that pure metric storage engines and frontends work on their own. Think of data retention, where the stored data points are aggregated after a while, and the resolution doesn’t fit your SLA reports anymore. That’s the thing where a RDBMS as backend for Icinga 2 comes into play, with the most integral part of your reports - state history, put into fancy graphs.

I would combine all these worlds, and put them into your environment. Carsten and myself gave a talk on this topic at last years OSMC, you might want to watch it.

The discussion link provided at the start is not working. Is it possible to update it?

The Monitoring-portal has been discontinued. All data in lost, or at least not publicly available.