Hello Community
I hope you can help me with my current Icinga challenge. I put together a visualization to
help with the big picture as well as the challenge I face. Maybe one of you has some ideas
how it could be possible.
I want to monitor the pictured infrastructure in Icinga. But I came across a
problem I’m not sure how to solve it and I hope with the help of the community I
will be able to do so.
Maybe this helps others as well.
Test scenario: Let’s assume I have a classical DC HA setup:
- Spine-Leaf network architecture
- VMware cluster with many hosts
- Shared and local storage
- Single instance business services as one VM or container
- Business services that can be balanced over many ESXi hosts as VMs
- Business services that can be spread out and balanced over many containers
on VMs, running on one or more ESXi hosts…
A normal setup for an enterprise.
A short explanation for the pictured infrastructure:
I simplified it here a bit. For this example I have two cases I would like
to tackle.
A bit more details: data-spines are coming as a lose set of switches but act as
a HA group. At least one switch needs to be up. Data-leafs are always combined
as a pair of switches to
provide a redundant connection that supports e.g. LACP mode. On the right is the
legend which explains the different objects and connections.
[Example 1]
The following scenario, which explains the monitoring situation for a faulty
situation, is shown on the right. “Data-Leaf 3” has a Problem (a specific defined
amount of services
are critical so the HOST is down). Now this has an impact on multiple other Hosts:
- The “Data-leaf Pair 3/4” should be in WARNING state because the redundancy is
not guaranteed anymore. So I want the “data-leaf-pair 3/4” to be in a warning
state (not down because Data-leaf 3 works completely fine). My problem here is a
HOST can’t be in “warning state”, so I thought maybe it is possible to realize
it with a SERVICE, but how can I request other HOST state or other SERVICE states
and use them in this SERVICE?
I know I am able to query for a Host_state but somehow not for a service state?
Or did I miss something?
[Example 2] - Since the “Data-leaf pair 3/4” is in WARNING state, it should inform/distribute
his state (Triangle) to the dependent child HOSTs (XEN #1 Server, VM-B1, VM-B2,
VM-Bn). So that means all dependent HOSTs need to be in the same state as the
“parent” HOST – in this case “Data-leaf pair 3/4" for the XEN server and the XEN
server for the VMs. My question on this example now is: Is it possible to add this
kind of “state” to an object? Does it maybe make sense to realize this with an
additional SERVICE on each HOST? For example as a service called “Host-state”
which shows the state of the current Host (Warning, critical, Ok, unknown)? Or
is there another solution.
I would appreciate if the community has some hints or ideas to maybe realize it
in a better or additional way.
Thank you in advance, I’m looking forward to work on this problem maybe with
someone else who has already tried something similar.