The thought of going from DSL to Director (and why I discarded it)

Hi all,
I don’t want to copy the discussion from Data Type "Dictionary" to allow for nested variables · Issue #337 · Icinga/icingaweb2-module-director · GitHub, but it is relevant to my post.
We are planning a new Icinga Cluster and wanted to give the Director a POC again. The main idea was, that the “Icinga Admins” configure all the commands, services etc (either on DSL or director). Hosts will be added/deleted automatically via the director, and the Devs can adjust thresholds, add new checks to their hosts etc. without in deep knowledge of how Icinga is working. They just shall have to edit their host objects via director. I played around a little bit with the master branch of the director, but I didn’t make to “translate” this DSL Snip:

apply Service "tcp: " for ( tcp => config in host.vars.tcp ) {
  import "generic-service"
  check_command = "tcp"
  vars += config
  assign where host.vars.tcp
}
object Host "abc" {
  [...]
  vars.tcp["123"] = {
    tcp_address = "xxx.xxx.xxx.xxx"
    tcp_port = "123"
  }
  vars.tcp["567"] = {
    tcp_address = "xxx.xxx.xxx.xxx"
    tcp_port = "567"
  }
}

On the one side, @tgelf described his opinion about dictionaries in this comment Data Type "Dictionary" to allow for nested variables · Issue #337 · Icinga/icingaweb2-module-director · GitHub.
On the other side, this kind of snip is officially documented here Monitoring Basics - Icinga 2

Based on my DSL Snip, it would be nice, if sb. could tell me, how to do it “right” in the director.

Another point I was struggling with, are the checks based on SSH. The lambda function of the -C Paramater in the “by_ssh” command is not present in the director, it is just empty.
image

I tried to rebuilt the command and inserted the lambda function. That worked, but the director does not like to add “{ }” to service variables. so things like giving parameters in the ssh arguments seems to be impossible. E.g.:

apply Service "load" {
  import "by_ssh"
  vars.by_ssh_command = "/usr/lib/nagios/plugins/check_load"
  vars.by_ssh_arguments = {
    "-w" = {
        value = "$load_wload1$,$load_wload5$,$load_wload15$"
        description = "Exit with WARNING status if load average exceeds WLOADn"
        }
    "-c" = {
        value = "$load_cload1$,$load_cload5$,$load_cload15$"
        description = "Exit with CRITICAL status if load average exceed CLOADn; the load average format is the same used by 'uptime' and 'w'"
        }
    "-r" = {
        set_if = "$load_percpu$"
        description = "Divide the load averages by the number of CPUs (when possible)"
       }
    }
  assign where host.vars.os == "linux"
}

is unrepresentable.
I have tested the director every now and then in recent years, because I am highly interested in “automate as much as possible and delegate everything else”. I have used the DSL for over 5 years now and every time I take a look at the director, I see a huge gap between it and the DSL and I am not able to use the stuff I learned over the time.

I don’t want to start a new basic discussion about DSL and Director. But I think, the differences between DSL and Director are too big. Shouldn’t I be able to use the director without any problems, when I configured Icinga via DSL for so many years and know, how Icinga is working? Shouldn’t there be some kind of documentation, which describe in detail, where the differences between the DSL and the Director are? Shouldn’t I be able to identify, what kind of configuration fit my needs best without spending hours of time to investigate it by myself?
After I was not able to “translate” my two examples into the director, after I have searched solutions in Github, the Docs and this forum, I stopped testing it and discarded the thought of using it. And to be honest, I am very sad about it. But when it seems to be so complicated or impossible to get these two examples running, how will it end with more complicated stuff like

  states = get_object(User, user).vars.mail_service_states  || [ OK, Warning, Critical, Unknown ]
  types = get_object(User, user).vars.mail_service_types || [ DowntimeStart, DowntimeEnd, DowntimeRemoved, Custom, Acknowledgement, Problem, Recovery, FlappingStart, FlappingEnd ]

?
If anyone already did the transformation from DSL to the Director, I really would like to exchange with him/her.

Cheers,
Marcus

I did this type of migration in multiple environments and I can say that I personally have no preference as both options are quite valid for me. DSL has the higher flexibility as there is no additional layer of abstraction, Director allows to give more people access to the configuration and has built-in import capabilities which are great as long as there is a good source.

So to address you questions and concerns.

Let’s the first one describe as “Different best practices of DSL and Director”:
Yes, it makes totally sense to limit options in the Director and to be opinionated as the DSL is not only a configuration format but also provides programming capabilities. So the DSL does try to allow as much flexibility so you can do everything needed for your rule-based configuration. The Director instead wants to provide a consistent webinterface also a non-monitoring-admin can use and also allow automation. I see the conflict between this and understand it. So for me such a migration is also always about rethinking the configuration as not every trick in the DSL is helping with an easier configuration in the UI.

The second one I would call “Limiting capabilites of the API”:
Somethings the director can not query via the API which is a problem of Icinga 2, but as the configuration lands on Icinga 2 in the end, it will always work. This is also annoying for the internal checks of Icinga 2. But here is the director the wrong to blame.

“Shouldn’t I be able to use the director without any problems, when I configured Icinga via DSL for so many years and know, how Icinga is working?”
Yes, but as I said there is some abstraction and different approaches involved so director is in fact a new tool to learn, but you should totally understand what the director is creating.

“Shouldn’t there be some kind of documentation, which describe in detail, where the differences between the DSL and the Director are? Shouldn’t I be able to identify, what kind of configuration fit my needs best without spending hours of time to investigate it by myself?”
No, there should be no need for such documentation, but there should be some more documentation instead of the technical one only. Icinga tried to have a technical writer in the past, doing use-case driven documentation in form of blog posts, but this unfortunately did not work out. I also recommended the blog posts at Monitoring Archive – UN*XE but they are more or less out-dated now. So I would say user or use-case orientated documentation would be the way to go here. Also having some conceptional documentation which explains why something is design in a specific way, how it should work and so on would help, because you will know if something as just meant to be done different or if it is a bug.

And for your last example I do not see any need as I would build it with the defaults on the notification and on the user template and then simply that user specific changes on the user, so no need for any or condition as all this is built-in.

So to summarize my opinion:

  • Icinga could do much better with documentation if doing the right one
  • Director and DSL are different concepts and no need for a 1:1 match
  • Both solutions are fine and can be used with success
1 Like

Hi Dirk,
thanks for your opinion. I am with you, that DSL and Director are different concepts. I am also willing to learn these different ones. But as I said, getting information beside of “try and error” is very hard. The Icinga2 Documentation is very good imho and as you said, the Director Documentation should be better. It would be nice, to have a documentation, which covers all the content of the icinga2 docs made with the director.

1 Like