Hello,
I spent three days to increase icingaweb2-grafana-influxdb2 performance with no success. A simple graph with ping - rta takes 5 Seconds, a Graph with all filesystems takes up to 17 seconds.
My Environment:
Icinga2: icinga2-2.13.2-1 (2 Master, 766 Services, 89 Hosts)
icingaweb2-grafana-module: 1.4.2
Grafana: grafana-enterprise-9.0.7-1
Influxdb2: influxdb2-2.3.0-1
Inlfuxdb2 Writer works since 3 weeks, all servers running with RHEL7
Influxdb2 and Grafana are installed on a seperate server, the queries are from the only (public) available Icinga2 Dashboard, written in Flux. I installed the grafana-image-renderer and also
chromium-103.0.5060.114
because the grafana-image-renderer missed some libraries.
When i noticed how slow the Grafana renderer works, I installed Influxdb2 and Grafana from virtual machine to hardware. (Cisco, 2 CPUs, 112 cores (hyperthreading), 64 GB RAM). But… the same situation, up to 17 seconds for one graph with the filesystems.
I tried a lot with grafana rendering performance tuning but with no success.
Here are 2 examples of my Flux Queries:
from(bucket: “icinga-bucket”)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r[“_measurement”] == “hostalive”)
|> filter(fn: (r) => r[“_field”] == “value”)
|> filter(fn: (r) => r[“hostname”] == “${hostname}”)
|> filter(fn: (r) => r[“metric”] == “rta”)
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: “mean”)
This hostalive graph takes 4-5 Seconds.
This graph with all filesystems on the server (max. 5) takes 17 seconds:
from(bucket: “icinga-bucket”)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) =>
r._measurement == “disk” and
r._field == “value” and
r.hostname == “${hostname}”
)
|> map(fn: (r) => ({ _value:r._value, _time:r._time, _field: r.metric }))
And… there is one Grafana error in the logfile:
logger=plugin.grafana-image-renderer t=2022-08-12T15:11:57.434174667+02:00 level=error msg=“Browser console error” msg=“Failed to load resource: the server responded with a status of 400 (Bad Request)” url=http://localhost:3000/api/ds/query
*logger=context traceID=00000000000000000000000000000000 userId=0 orgId=1 uname= t=2022-08-12T15:12:12.712598907+02:00 level=info msg=“Request Completed” method=POST path=/api/ds/query status=400 remote_addr=[::1] time_ms=15301 duration=15.301854148s size=20728 referer=“http://localhost:3000/d-solo/_awwcdh7z/icinga2-default?var-hostname=myserver.domain&var-service=st-agent-linux-filesystem_tmp&var-command=disk&panelId=8&orgId=1&width=640&height=280&theme=light&from=now-30d&to=now&render=1” traceID=0000000000000000000000000000000
Does anyone have any ideas?
Cheers,
Heige