Extracted from this discussion.
Could some, in short, explain me the difference between icinga2/icingaweb2 and Prometheus?
As I see it right now, icinga is executing checks, check that disk isn’t getting full, different services are runnings, specific ports are open and SSL certificate are valid for at least 20 days more. Icinga is not saving metrics data over time. Prometheus checks different process performance over time. For example memory usage by Passenger or open connections to websocket or web request time.
One difference is that Icinga actively executes check scripts which return state, output and performance data metrics. These values are collected and used for further state history calculation, notifications, dependencies, etc. Metrics can be forwarded to popular TSDB backends for storage.
Prometheus implements its own TSDB afaik. v2.0 has a rewritten one which is not compatible to v1.0. To my knowledge, services need to export metrics via HTTP /metrics endpoint and you’ll configure Prometheus to go look there.
Discovered metrics e.g. from container services are stored more easily. Based on the stored data, you can create queries for alerts. There is no centric host/service model with static configuration.
I haven’t tried Prometheus in detail yet, but I could think of the following questions:
- Does it support multiple levels of distributed monitoring with satellites and clients?
- Is it possible to configure the connection direction, e.g. into the DMZ or from inside the DMZ
- How to apply dependencies/reachability prior to alerts
- Security: TLS, CN validation, etc.
To me, both worlds follow different approaches and probably can be integrated in common scenarios.
Michael listened to some Twitter discussions and talks then.
I had a look into it lately, since I was doing a research on tools and their possibilities with SNMP monitoring et al.
If your service doesn’t expose an HTTP endpoint with metrics, you need to write a wrapper or use a converter script to pass these things into Prometheus.
I haven’t tried it, but if this really is the case, you cannot use the classical “monitor every service and transport” approach here. Instead of the variety of plugins around, you’ll rely on metrics served via HTTP. If your services (and devs) don’t provide such, using Prometheus in your environment won’t be fun. No metrics, no alerts, no SLA.
It is highly likely that an integration with Prometheus makes sense, where you put your classical service monitoring with Icinga and variants up front. Then you’ll expose the plugin perfdata metrics via HTTP to Prometheus to allow them being collected. A similar thing was requested on GitHub already.
Cool, I think a integration would be very cool.
Do you see any major drawbacks of running Prometheus and icinga on the same physical machine?
Some of these metrics looks very interesting https://samsaffron.com/archive/2018/02/02/instrumenting-rails-with-prometheus for my usage. I am considering that I can just start a different container and install Prometheus there without any integration with icinga.
In an integration. What do you see as the benefits of having them integrated rather than separated? Single responsibility, if one crash it can’t take the other one down.
I have never run any Prometheus instance myself, I know nothing about its resource requirements. I wouldn’t run 2 monitoring applications on the same host though, as the failure of one (OOM or full disk for example) could kill the other one.
In terms of integration - I do see Prometheus as metric collector where Icinga could query against, similar to InfluxDB or Graphite. Having cluster and container checks with highly volatile data inside, this sounds like an interesting idea.
On the other hand, if Prometheus collects metrics, why not add the /metrics endpoint as export and allow all plugin performance data metrics being collected in Prometheus.
Those are just ideas from my mind, nothing I have tried nor designed. Waiting for community members to step up and actually build such things
Jan adds different monitoring types:
For starters you could search for the difference between whitebox- and blackbox-monitoring.
Also this article might be helpful to see the difference: https://insights.sei.cmu.edu/devops/2016/08/whitebox-monitoring-with-prometheus.html
Assaf shares his experience:
I have implemented both system ( in differing scales ) and can say that comparing them is not doing justice to either.
Icinga is an active (pull) system where you actively check the status of the state you want to monitor.
Prometheus is a passive (push) listener that scrapes data from individual services executed on the target nodes, in a pre-set interval (which can be altered) but out of the box, it will not complain if a metric is not coming or if it can not scrape the data from a node.
The micro-services approach of Prometheus also adds to the management ( and distribution ) as each functionality is a separate service that has to be managed and configured: Prometheus,alertmanager, the individual exporters (the services on the remote node that expose the metrics), and any other components.
Prometheus’s own graphical interface is lacking, to say the least, and require the integration of a 3rd party tool, mainly Grafana to create the dashboards and the visualisation of the metrics.
While Icinga was not build as a Time series metric collector, but as a “state probe” tool, Prometheus has, and as such they function with a different approach and methodology. Granted they are both a monitoring tool, but each was built with another goal in mind.
What’s your 2 cents on the matter?