This issue has been migrated from Redmine: https://dev.icinga.com/issues/6871
Created by 9er on 2014-08-07 10:58:06 +00:00
Assignee: _(none)_
Status: _New_ (closed on _2014-11-24 14:16:32 +00:00_)
Target Version: _Backlog_
Last Update: _2016-11-28 12:09:11 +00:00 (in Redmine)_
Problem:
Config could look something like this:
object Dependency "server-routers" {
parent_host_name = [ "router1", "router2" ]
child_host_name = "server"
}
Maybe even add an option to decide, whether all parents must be down ("AND") or just one ("OR"). The "OR" option can then be used to summarize multiple single dependencies.
object Dependency "server-routers" {
parent_host_name = [ "router1", "router2" ]
child_host_name = "server"
operation = "AND"
}
Relations:
Updated by mfriedrich on 2014-11-11 22:37:08 +00:00
Should be easier with the new apply Dependency for rules in 2.2. That way you can generate your dependencies based on custom attribute arrays/dictionaries.
https://github.com/Icinga/icinga2/blob/master/doc/4-monitoring-basics.md#using-apply-for
Updated by mfriedrich on 2014-11-24 14:16:32 +00:00
Updated by mfriedrich on 2015-02-01 10:02:31 +00:00
Updated by mfriedrich on 2015-02-14 22:57:31 +00:00
Updated by deneu on 2015-02-15 08:35:33 +00:00
If dependencies are logical or combination why are the childs then unreachlbe if one parent goes down?
Updated by mfriedrich on 2015-02-15 10:51:07 +00:00
If one dependency fails, the child becomes unreachable. That's a logical 'or' combination. You instead want an 'and' combination where all parent states are taken into account and the reachability state is calculated from all of them.
Updated by deneu on 2015-02-26 09:35:45 +00:00
dnsmichi wrote:
If one dependency fails, the child becomes unreachable. That's a logical 'or' combination. You instead want an 'and' combination where all parent states are taken into account and the reachability state is calculated from all of them.
Yes exactly! How can i change an or to an and?!
Updated by mfriedrich on 2015-02-26 09:41:21 +00:00
I don't have an implementation idea currently. It ought to be sort of business process where you'll take several other object states into account and then determine the final state, but that's not thought through. On the other side, there's no clean way where such a configuration item could be place. 1) It does not fit on the host/service itself. Applying Dependency objects with different types and multiple ones - not a clear result set. 2) Put on dependency objects themselves - how to know how many other dependencies will be generated from apply rules?
That being said, if you come up with a better proposal and implementation design, feel free to do so. I'm not entirely sure that your request targets the right solution.
Updated by deneu on 2015-02-26 15:46:43 +00:00
I have no idea to implement it in icinga2 but i think this is necassary and it was a good working default function in icinga/nagios?
The example in github/documentation is not working as it should be because it a 'or' condition as u said. Without this feature large infrastructures can't be monitored in the deep i think. So why not use a "parent"-like function?
Updated by mfriedrich on 2015-02-26 15:51:58 +00:00
What is a "parent-like function" in your design?
Updated by deneu on 2015-02-27 08:56:54 +00:00
dnsmichi wrote:
What is a "parent-like function" in your design?
I mean the same function as the "parents" attribute in icinga1x/nagios.
Updated by mfrosch on 2015-02-27 09:50:12 +00:00
Status changed from _New_ to _Feedback_
object Host "router1" {
import "generic-host"
}
object Host "router2" {
import "generic-host"
}
object Host "foobar" {
import "generic-host"
vars.parents = [ "router1", "router2" ]
}
apply Dependency "routers" for (router in host.vars.parents) to Host {
parent_host_name = router
assign where host.vars.parents
}
You could also do some assignment via any other var that creates a logical dependency.
Does this fulfil your requirements?
Updated by mfriedrich on 2015-02-27 09:55:32 +00:00
Updated by deneu on 2015-02-27 13:39:57 +00:00
lazyfrosch wrote:
[...]
You could also do some assignment via any other var that creates a logical dependency.
Does this fulfil your requirements?
Hey Markus,
sorry no it does not full my requirements because its only an other way to apply the dependency but its still "or" right?
Updated by mfriedrich on 2015-03-02 15:26:14 +00:00
deneu wrote:
dnsmichi wrote:
> What is a "parent-like function" in your design?I mean the same function as the "parents" attribute in icinga1x/nagios.
I'm leaving this issue in a 'new' state until you've come up with a detailed design and implementation proposal for Icinga 2.
Updated by nuts on 2015-03-26 00:30:57 +00:00
careful inquire - are there new findings?
We have the problem unfortunately, too. Is the logical "or" useful in some cases?
Updated by mfriedrich on 2015-04-08 11:24:05 +00:00
Feedback from SIG-NOC meeting:
Updated by mfriedrich on 2015-06-23 13:36:38 +00:00
Updated by klon on 2015-08-02 06:43:34 +00:00
Please clarify, will you implement "AND" feature?
Or need you some additional feedback from users?
It's very usable for us too.
Updated by mfriedrich on 2015-08-03 15:40:31 +00:00
There are no resources available for this task currently, therefore we've put this onto "Backlog". We will re-iterate over these issues from time to time after our feature development sprint for 2.4 is finished. On the other hand, one might chime in and sponsor/provide a patch for this feature (similar to other issues remaining on "Backlog" for the time being).
Updated by mfriedrich on 2015-08-31 13:54:24 +00:00
Updated by mfrosch on 2015-09-01 12:51:28 +00:00
Updated by mfriedrich on 2015-09-03 16:23:16 +00:00
Some reference: https://www.mail-archive.com/[email protected]/msg19373.html
Still, I don't have a viable design in mind to solve the problem properly. Even some sort of Dependency grouping would still cause trouble in the way chained and inherited dependencies will work, tree-wise.
Updated by barry.quiel on 2015-09-09 16:38:29 +00:00
I like the idea of this feature, but it may have already been solved indirectly. In the release announcement for 2.3.0 ( https://www.icinga.org/2015/03/10/icinga-2-v2-3-0-released/) there is a sample of a dummy cluster object. That object could serve this purpose. The challenge that I haven't figured out is how to get the nodes added to the cluster dynamically. In the example the nodes of the cluster are added to vars.cluster_nodes of the dummy object. I would like some sort of object name lookup function that uses a regex or match to find the names of the objects.
Something like:
vars.cluster_nodes = find_host_object("cluster-node*")
It seems like I should be able to figure this out considering that something very similar already exists in apply statements ( match("cluster-host.*, host.name) ), but I can't seem to wrap my head around the right set of functions.
Updated by dgoetz on 2015-09-14 09:30:44 +00:00
Just a brief summary to see if my assumptions are correct.
You can create multiple dependencies (either manual or by using apply). This will mean an OR and should now work with the last patches like icinga 1, meaning if only one dependency fails the status is unchanged, if all fail the host / service is unreachable. Not sure about what this means for the attributes of the dependency "disable_notifications" and "disable_checks".
If this works like these are applied when it is unreachable, then I see nearly no case for an AND. Because then AND would be the behavior before the last patches that the host / service needs both and be unreachable if one fails.
If this is also needed I do not like the operator attribute. I would prefer not changing the normal configuration for OR (two or more separate dependencies) and introduce an array for "parent_host_name" and "parent_service_name" for AND. I think this would be easy to configure and understand.
Updated by mfriedrich on 2015-09-14 11:35:45 +00:00
dirk wrote:
Just a brief summary to see if my assumptions are correct.
You can create multiple dependencies (either manual or by using apply). This will mean an OR and should now work with the last patches like icinga 1, meaning if only one dependency fails the status is unchanged, if all fail the host / service is unreachable. Not sure about what this means for the attributes of the dependency "disable_notifications" and "disable_checks".
No, Icinga 1.x had the additional "parents" attribute which does not exist anymore. That one causes all host parents being combined as "AND". host dependencies as is work like the old Icinga 1.x ones - their logical condition is "OR".
This ticket is all about adding these parent relations again into Icinga 2, somehow related to dependencies. At least the original poster demanded it, though I'm not very happy about the recent suggestions to solve the issue. None of them would fit into the existing design (configuration language, object references, external interfaces, etc).
If this is also needed I do not like the operator attribute. I would prefer not changing the normal configuration for OR (two or more separate dependencies) and introduce an array for "parent_host_name" and "parent_service_name" for AND. I think this would be easy to configure and understand.
Keep in mind that the Dependency type is not exclusive to a host or service type. The attribute "parent_host_name" without "parent_service_name" would generally work as array, but what happens if you'll define service dependencies by having "parent_service_name" set? Take the following example:
apply Dependency "foo" to Service {
parent_host_name = [ "h1", "h2" ]
parent_service_name = [ "ping4", "ping6" ]
assign where true
}
Now guess which parent host and services will be referenced. There's multiple ways to interpret this configuration, and it makes validation and output even harder. Icinga 1.x had a similar issue with service group members being a list of host-service name tuples.
Probably that approach makes sense, though the array solution is not very elegant and not error prone.
Kind regards,
Michael
Updated by lesinigo on 2016-11-28 12:09:11 +00:00
This ādependencies in AND/ORā thing really hit us in the transition from Nagios to Icinga2. We used to extensively use āparentsā in Nagios for use cases that have already been mentioned here, the two most common ones for us are:
Here are my two cents on this topic...
I think a possible approach would be being able to specify a āfunctionā for an object, it would get all dependencies as inputs and provide a reachability status as output.
It could be an actual function (i.e. let the users build strange stuff if they want) or just a parameter for the common ones (at least and and or, but maybe all the usual boolean logic functions? and or xor notā¦).
This would be more than enough for our use cases, which I strongly suspect are the same as >90% of people having this issue, and it does not break backward compatibility since you can just leave the default at the current logic, which we could call āorā.
Actual example:
object Host āRouter-Aā {
}
object Host āRouter-Bā {
}
object Host āBigServerā {
// if all deps are DOWN, then this Host is UNREACHABLE
unreachable_condition = āandā
}
object Dependency āServer-to-RouterAā {
parent = āRouter-Aā
child = āBigServerā
}
object Dependency āServer-to-RouterBā {
parent = āRouter-Bā
child = āBigServerā
}
This approach would not be enough for more complex scenarios, but AFAIK they all boil down to a single "child" object having multiple, distinct "dependency groups" and some of them actually being dependencies on āat least one ofā.
If we are talking about strict reachability logic (ie. āI want to know if I can monitor this, so I can know if it is actually DOWN or if it could be UP but I donāt see it) I cannot right now think of any actual, real life scenario where you would have one of those more complex scenarios.
On the other hand, if we are talking about āavailabilityā of a service then there could be multiple dependencies, but IMHO this is wrong to do as dependencies in a monitoring system. It could be useful to have them somewhere else (think Business Processes), but IMHO it is definitely not part of the core, low-level logic, between low-level objects: that logic should just explain how to calculate reachability, to discern ākoā states (down, warning, critical) from ādunnoā states (unreachable, unknown).
Side note #1: the example I described could be used as a basis towards supporting more complex scenarios, if you really want: it would probably need a new kind of object, some sort of āintermediate dependencyā, so an Host/Service could have āunreachable_condition ANDā dependencies on some of those intermediate objects and those intermediate objects could have āunreachable_condition ORā dependencies on their parents if needed. And I suspect that often, if you really want this kind of "dependency group" stuff, you can already hack it together using dummy hosts or services, even if it's not a clean solution.
Side note #2: if I have not missed anything, the current documentation does not explicitly talk about multiple dependencies. It should be improved to clearly state what happens when an Host or Service has multiple Dependencies and what logic is used to determine child state when one or more dependencies fail. This applies to both the current situation and to any eventual outcome of this feature request.
Hi all toghether, I just came from monitoring-portal.com.
I just read everything you posted here and I am thinking of what the purpose of the OR operator is here?
I cant think of a situation where I would like to define multiple dependencies for a host where one failing parent would mean the child is unavailable.
This might be good for services where one service is depending on all parent services, but therefor it would be better to use buisness procccess right?
So why not always use AND as the default parameter? Do you have any idea on when this feature (somehow) could be useable in a stable release?
I definitly need this feature, that why I currently think about a quick dirty trick and write a script combined with db querys and api calls that does the trick for me..
Thank you for any details!
@Duffkess I too can't think of a situation where the "OR unreachability" logic makes sense, but I never asked to simply change it to AND because it is the current Icinga2 functionality and there can be someone out there who is actually using it. A flag to switch between AND and OR _shouldn't_ be too hard to add and you can just set it to _your_ preferred default in something like the generic-host template.
@ icinga devs: I briefly talked about this issue with @gethash during the last Puppetconf, is there anything (as in _"anything but C/C++ code"_ š) we can do to help this move along?
There isn't any technical concept or specification available. We are not doing anything here, unless someone considers sponsoring this feature.
@dnsmichi The OP provided a concept, and I do not see what is wrong with that. Perhaps you can clarify?
The suggestion was, to the best of my understanding, to save a parameter on the Dependency objects telling what boolean operation (default OR) to use when an object has multiple parents. The configuration could be applied like:
apply Dependency "Parents" for (parent in host.vars.parents) to Host {
parent_host_name = parent
operator = "AND"
assign where host.vars.parents
}
You commented before on the OP's suggestion that:
Applying Dependency objects with different types and multiple ones - not a clear result set. 2) Put on dependency objects themselves - how to know how many other dependencies will be generated from apply rules?
I'm not sure I understand this comment (maybe it no longer applies), but the configured operator would apply for the specified dependency (i.e. "Parents" here).
Summary of the issue below.
The OP proposal won't work for service parents.
parent_host_name = [ "router1", "router2" ]
parent_service_name = [ "port1", "port2" ]
There is no indication which service parent objects could be specified.
Dependencies are objects generated from apply rules. They also may co-exist written on their own.
In the end, everything looks like this
object Dependency "router1-dep1" {
parent_host_name = "router1"
..
}
object Dependency "router2-dep2" {
parent_host_name = "router2"
...
}
You would need to glue them together. Still, where would you do that? In side "router-dep1" ... or "router-dep2" .. or both? That allows for inconsistent configuration and makes it hard to troubleshoot. Especially if both Dependency objects exist in different file locations.
If you are putting such into an Apply rule (normal, or a for loop), this could somehow "generate" the logical AND or OR relation between generated dependency objects. This is different to everything else known from apply rules, just a specific idea for Dependency logic. Hard to explain, hard to document, and nothing one would like to maintain or support.
Since Dependency objects are just generated from apply rules, one would need to find a way to "glue" them together.
Or, to glue parent objects and their state together. This is what this is all about - if the conditions
are fulfilled, dependency logic kicks in as you are used to it currently.
Icinga 2 should know upon reachability calculation, that an object has multiple parents, and how they relate together. Or how to fetch the overall state from the parents objects, and which rules to apply on them.
One thing to keep in mind while designing this: Multiple parents mean 2, 3, or 10 or n parent objects combined with a logical operator.
Should it always be dep1 && dep2 && dep3 or would one like to have dep1 && dep3 || dep 2 for example?
This needs to be considered before adding this feature, to avoid possible feature requests and have yet another hardcoded solution.
Have a business process configuration which deals with the parent host's overall state. It considers the state from its own defined logic and expression language. This requires the businessprocess module for Icinga Web 2 and its icingacli component. CheckCommands in the ITL do exist.
Define a dummy host or service which calculates the runtime state via Icinga 2 DSL, similar to the cluster check example from the docs.
Most likely a rule based language addition which is evaluated after all configuration objects are loaded, is the best addition here. This requires the implementation of #3520 in a certain way, and some specific sets and rules one can use. This could also involve functions which retrieve parent trees and modify such after.
It may also be reasonable to take virtual hosts/services into account, which combine specific "parent" states required for the described scenarios.
I wouldn't change anything in the "parent_host_name" and "parent_service_name" attributes, as backends like databases rely on the name and relationship between objects. So does anything which queries Icinga 2 data.
The key point is configuration here, and how to handle the algorithm. I think we all agree on this.
That's my 2 cents on the matter. If you agree on the ideas, or have any other proposals which help here, feel free to suggest/sponsor.
Another suggestion, perhaps simpler, though more specific:
Just have a "reachability" field for Dependencies. If its set true, then it is considered together with any other applicable dependency that has "reachability" set true. The dependency only fails when ALL reachability-tagged dependencies fail.
@dnsmichi Thank you for the summary, that's very helpful! A comment on "glueing" the dependencies: That could perhaps be done based on grouping the children (child_service_name and child_host_name) with some other attribute of the dependency object. The reachability of the child is then calculated from the combined state of all parents in the group using the configured operation (with no mixing).
The "other" attribute to group by could be a new parameter or, with some modifications, the name of the dependency (by which I mean the name as given in config files, let's call it short_name).
Let me explain what I mean by modifications and do correct me if I'm wrong, but I believe that currently when you configure a dependency, an object is created as:
child_host_name!short_name
or if child_service_name exists
child_host_name!child_service_name!short_name
In the case of apply rules (at least for Hosts), some magic is performed and the parent_host_name is concatenated with short_name (I assume to avoid duplicate objects). If all dependency objects would be created as
child_host_name!child_service_name!short_name!parent_host_name
this problem would be allieviated and configuration (at least to my eyes) be more consistent. parent_host_name is anyway mandatory for a dependency so perhaps(?) this change would be acceptable.
I understand your concern about future feature requests regarding how to combine parent states and I do not have a suggestion here. From previous comments in the thread and my own use case, the most important thing seems to be able to configure a single operation to use for all parents.
How about changing how multiple single dependencies work? Switch from AND to OR?
So, if I have host a which I can reach via two redundant routers b and c and I create a dependency so a is dependent from b and another dependency so a is dependent from c. Current versions will show a unreachable if either b or c go down.
Couldn't we just change the behaviour that a is reachable when at least b or c is up?
I know this wouldn't satisfy all needs but I'm quite positive there are a lot more users that want it the other way round from what it is now.
Reading through this I can see why the devs are quite averse to any of the above. The language of the configuration and the way that dependencies can be applied means that a host has no idea what services, dependencies, etc might be applied to it, thus making the host the wrong place to define how the logic for those dependencies should be applied.
Similarly, you can't define AND/OR this in the dependency because you don't know how to group the dependencies.
However: I think there's a way to implement the latter that isn't too painful. It does rely on people being sensible with the "priority" field.
Here's a worked example:
If I defined, for example (and I'm doing this very early morning from memory - not had any coffee yet!):
object Host "router-a" {
...
}
object Host "router-b" {
...
}
object Host "host-a" {
...
}
apply Dependency "host-a-on-routers" to Host {
child_host_name = "host-a"
priority = 1
logical_operator = "OR"
assign where match("router-*", host.name)
}
This would generate an 'availability expression' for host-a of (router-a || router-b).
Where "priority" and "logical_operator" are left undefined, "logical_operator" would default to "AND" (as it currently does). Priority would default to (say) 100 to ensure all user defined logic takes precedence. The 'availability expression' would be build with priority 0 first, going all the way up to, say, priority 255.
Under this scheme, if you want to apply OR-based logic you must define priority and the logical_operator to be OR.
How does this sound to the developers? I'm sure I've missed something here - but I can't immediately see it.
If the developers see merit in this as a potential solution, I am happy to try and find some time to start putting together some patches.
I'm not sure I follow what you intend with priority, are you proposing that only the lowest priority is checked? Or what do you mean by user-defined logic takes precedence?
@odeshog - So... Priority would be to do with the order in which groups are ANDed together
Let me extend my (fictional) example:
Host A depends on one of router-a or router-b and one of router-c and router-d being up.
object Host "router-a" {
...
}
object Host "router-b" {
...
}
object Host "router-c" {
...
}
object Host "router-d" {
...
}
object Host "host-a" {
...
}
apply Dependency "host-a-on-min-router-ab" to Host {
child_host_name = "host-a"
priority = 1
logical_operator = "OR"
assign where match("router-(a|b)", host.name)
}
apply Dependency "host-a-on-min-router-cd" to Host {
child_host_name = "host-a"
priority = 2
logical_operator = "OR"
assign where match("router-(c|d)", host.name)
}
With what I am proposing, I would expect the availability logic to generate the following boolean expression for reachability of host-a
(router-a || router-b) && (router-c || router-d)
Priority 1 entry was anded with the priority 2 entry (which is then anded with any higher number priority entries).
It allows relatively complex expressions. I think the only one that proves a problem is (w && x) || (y && z) - but I'm fairly sure you could rewrite that a different way.
Does that explain what I was on about
@cjsoftuk thank you, that's very clear now. Functionally this is what I, and pretty much every other user in this thread, want. I would only suggest some more descriptive name instead of priority, dependency_group perhaps.
@odeshog - Agreed - perhaps wrong name. Was about half 7 in the morning when I wrote that having not fully woken up!
I am interested to hear what the core developers think when they get some time to assess this. I definitely have a bunch of use cases where "and" is completely the wrong operator.
After reading/skipping through this and reading @dnsmichi https://github.com/Icinga/icinga2/issues/1869#issuecomment-343167646 to the thread an idea came to my mind, regarding the "virtual parent object".
Though this might just be applicable to host dependencies:
Why not create a host object that uses check_multi/check_multiaddr as check command and checks the IPs of your various hosts you want to have as parent. As long as one of the IPs answers the dependencies is fullfilled.
This is, I think, a valid workaround until the "big issue" gets tackled.
I think there's a rather easy and straightforward solution to the problem.
First: There seems to be some confusion regarding what an _and_ resp. _or_ operator is supposed to mean: Is the dependency OK when A and B are OK or is the dependency violated only if A and B are not OK. I propose to use the terms _cumulative dependency_ and _redundant dependency_ instead.
The main pont is that any boolean expression can be expressed as a single _and_ of several _or_'s of the input variables and their negations (or as a single _or_ of several _and_'s of them).
We already have one of the logical operators as several distinct Dependency Objects are interpreted to be cumulative.
We already have a notion of negation since a Dependency Object's states attribute can be set arbitrarily.
So the only thing missing is a way to express redundant dependencies. If we introduce a syntax to list several parents in one Dependency Object which are then interpreted to be redundant, we're done.
A possible syntax would be an array of dictionaries, i.e.
parents = [
{
host = "host1"
service = "service1"
states = [ OK, Warning ]
}
{
host = "host2"
service = "service2"
states = [ Critical, Unknown ]
}
]
You could probably introduce some short-cuts, like
parents = [
{
hosts = [ "host1", "host2" ]
service = "service"
}
]
(and host/services).
One could even take
parents = [
{
hosts = [ "host1", "host2" ]
services = [Ā "service1", "service2" ]
}
]
to mean all four combinations of hosts and services.
Or you revert to writing down the expression explicitly in a syntax like
parents = "host1!service1[OK,Warning] || ( host2!service2 && host3!service3)"
parents = [
{
hosts = [ "host1", "host2" ]
services = [ "service1", "service2" ]
}
]
brings back the confusion which objects are actually attached to each other. All? Some? I've mentioned such in https://github.com/Icinga/icinga2/issues/1869#issuecomment-343167646
In terms of a behavioural change, is your desire to control the behaviour of multi-parent dependencies in a global fashion, or on a per object basis? https://github.com/Icinga/icinga2/issues/1869#issuecomment-351703502 from @widhalmt implies global.
I don't quite get why you focus on a possible special case of a proposed possible short-cut instead of commenting the overall concept (distinct Dependency Objects cumulative as-is, additional syntax for redundancy-type parents within a single Dependency Object).
As I wrote, In that special case I would expect all four combinations.
As I wrote, I don't intend to change the global behaviour of multiple distinct dependencies, which must remain cumulative because that's normally what one needs: SSH login is not supposed if the server is down; it also allowed to fail if name resolution doesn't work; it may fail because LDAP is down, etc.
In my application, I normally need the cumulative type. However, in some cases (which reflect redundant services), I need the redundancy type: name resolution is not supposed to work if all configured resolvers are down; ntpd is not supposed to be happy with it's peers if all primary ntp servers are down etc.
I'm not in the mood to argue about who said what and why, but look for a possible solution which satisfies the majority.
So to speak, @widhalmt could live with a global configuration option changing the global behaviour. That's fairly easy to implement although I'm not a friend of such config options. @efuss wants to configure it on a per object level, which still needs a better configuration specification.
Is there a common sense amongst others contributing to this issue?
Use the snippet below.
- [ ] global option
- [ ] per object
For me a global option would be enough as I commonly need only the logic for redundant dependencies like two switches and the servers behind should be unreachable if both are down or virtualization hosts and the virtual machines should be unreachable if the hosts are down.
I agree that a global option is a bad idea. It's confusing and I guess nobody can live with all dependencies being interpreted as redundant (you certainly don't want the implicit dependency of a Service on it's Host and other explicit dependencies being regarded as redundancy-type).
My typical scenario is that a server process needs, say, LDAP and name resolution to work and I have two LDAP servers and two resolvers. I need to be able to express that. The easiest and most straightforward solution I can think of is to apply some āneeds LDAPā dependency to all services requiring LDAP lookups and, in that Dependency Object, write something like
parents = [
{
host = "ldap-primary"
service = "ldap"
}
{
host = "ldap-backup"
service = "ldap"
}
]
where the parents array's elements are interpreted as redundant options.
Add a āneed resolvā dependency for Service Objects dependent on name resolution, have both dependencies interpreted cumulative (as-is) and, in my opinion, you're done, no?
Do you find that not covering people's needs, being confusing, over-engeneerd or difficult to implement?
It is a mix of both - confusing and likely users may not understand it, or implement it the wrong way. I know that you could live with your proposal, but you're not the one maintaining all the features and doing support and documentation for them - no offence here.
Probably the array with dictionary notation is the best proposal so far. Still, how would such an array be dealt with - is it AND or is it OR. One should be able to tell just by looking at the config snippet.
For the implementation parts, one needs to consider that such dependencies are not available in the IDO backend, might be hard to traverse that via REST API too. Not so easy, and should be taken into account when creating a new backend, FYI @lippserd
Either way, I'd still like to hear what others in this issue think about
- [ ] global option
- [ ] per object
Perhaps what I forgot to mention is my use of the businessprocess addon for more complicated dependency requirements which can also solve things like requiring two out of five webservers. Having such things in the core could be helpful, but I am fine with having them somewhere else and an simple solution in the core.
A global option would be enough for us but a per-object option would be preferable in the long run.
One implementation "detail" to consider: Right on, object relations need to be tracked, e.g. for the REST API joins. Such a thing won't be possible with a parents array, nor could it be dumped to the database backend easily.
https://github.com/Icinga/icinga2/blob/master/lib/icinga/dependency.ti#L70
I'm thinking about a different method, like grouping these dependencies and evaluating them based on a specified operator. This wouldn't change anything with the current Dependency configuration objects, and introduce an optional element. It also can re-use the group assign where logic, and copy the group membership resolving.
Let's see about that, right on I have some bugs for 2.9 prior to looking into this again.
@htriem has joined the Icinga 2 core team a while ago with now taking more maintainer responsibilities.
During the issue grooming we had some weeks ago, his exercise was determine the question and solutions in this issue. Also, a special exercise was to weigh the issue in terms of config option vs. change the behavior. Without any influence by myself who has no clear view on the issue anymore.
This is what @htriem achieved (at the very moment he's on vacation, therefore I am writing this now):
1) We couldn't find a reliable scenario where the current behavior with the "if one dependency fails, mark this unreachable" would apply.
2) The topic contains 20 thumbs up and responses to change the behavior, with only developers wanting to keep the current behavior. That's a fair point to take into account.
3) The proposed configuration options are either too complicated, or they do not fit the current DSL approach with "there's only one way to do it right" and "keep it simple, stupid".
4) During the grooming session, an actual patch was implemented in #7785 to change the behavior.
A small bug with "no dependencies -> be reachable" existed, which was unveiled with our unit tests yesterday. Already fixed.
I've added some more unit tests in #7785 rendering this change "bullet proof".
I'm unhappy with this change because it can lead to unrelated services being regarded as redundant wrt. to each other. It can even make a host being regarded as redundant wrt. to a service.
For example (unfortunately, I didn't think about this in the discussion and stumbled over it only after upgrading to 2.12.0), applying the the explicit disable-host-service-checks dependency described in the Monitoring Basics chapter will defeat all other dependencies.
My original dictionary idea seems to be too complicated.
I then came up with the idea to introduce an essential attribute for Dependency Objects, meaning that dependency alone will make the parent unreachable. I implemented this, but after that came up with still another idea.
What about a new redundancy_group attribute for dependencies?
Specifying a redundancy_group would cause a dependency to be regarded as redundant only inside that redundancy group, e.g., "routers".
Dependencies lacking a redundancy_group attribute would be regarded as essential for the parent.
This would, with only one additional simple string attribute, allow for both cumulative and redundant dependencies and even a combination (cumulation of redundancies, like SSH depending on both LDAP and DNS to function, while operating redundant LDAP servers as well as redundant DNS resolvers).
I've implemented this in https://github.com/Icinga/icinga2/pull/8218 and it appears to work. I can't tell whether the additional std::unordered_map computations in Checkable::IsReachable() are tolerable for huge installations.
I also don't feel comfortable enough with the test framework to integrate unit tests for the proposal. The current tests are, of course, expected to fail with the change.
I received a mail PR run failed: Packages - Introduce redundancy groups for Dependency Objects (3b10401) that I don't understand.
In the reports, I see some "Job canceled" messages plus Error: Transaction test error:
1181
file /usr/include/mysql/mariadb_rpl.h conflicts between attempted installs of mariadb-devel-3:10.3.22-1.fc31.x86_64 and mariadb-connector-c-devel-3.1.9-5.fc31.x86_64
I don't think my patch broke that.
Most helpful comment
How about changing how multiple single dependencies work? Switch from AND to OR?
So, if I have host a which I can reach via two redundant routers b and c and I create a dependency so a is dependent from b and another dependency so a is dependent from c. Current versions will show a unreachable if either b or c go down.
Couldn't we just change the behaviour that a is reachable when at least b or c is up?
I know this wouldn't satisfy all needs but I'm quite positive there are a lot more users that want it the other way round from what it is now.