Availability & events

Availability and event data relate to changes in the state of hosts and services. These data are used in report designs and by the event and availability widgets included in Centreon MBI.

There are no special system prerequisites for creating the reports that use this information, other than a plugin that returns the state. This section describes basic concepts and calculations needed for using and analyzing Centreon MBI reports.

Availability

Hosts

A host is considered available when its state is “Up”.

To calculate the availability rate the formula is: “Up” duration / (“Up” + “Down” durations)

Additional rules:

  • Time spent in the “Unreachable” state is not considered in the calculation of availability.

  • Time spent in “Planned Downtime” is not considered in the calculation of availability.

Example: For a report covering one day, if a host is available 23 hours and unavailable 1 hour out of a 24 hour-period, its availability will be 23 hours / (23 + 1) ~ 95.8%.

Services

A service is considered available when its state is “OK” or “Warning”.

To calculate the availability rate the formula is: (“OK” + “Warning” durations) / (“OK” + “Warning” + “Critical” durations)

Additional rules :

  • Time spent in the “Unknown” state is not considered in the calculation of the availability.

  • Time spent in “Planned Downtime” is not considered in the calculation of availability.

Events

Only validated events are considered in the calculation of events. This corresponds to the “HARD” state in Centreon.

In the reports, several message types correspond to different states:

  • Exception: Denotes a “Down” state for a host and a “Critical” state for a service.

  • Warning: Denotes a “Warning” state for the services, but there is no equivalent for hosts.

  • Information: Any other state.

An event on a host or service is characterized by three values:

  • A start date

  • A end date

  • A state.

Additional indicators

  • MTRS (Mean Time To Restore Service) pertains to maintainability: Average duration of the failure. This indicator should be as low as possible.

  • MTBF (Mean Time Between Failure) pertains to reliability: Average time between the end of an incident and the beginning of the next. This indicator should be as high as possible.

  • MTBSI (Mean Time Between Service Incident): Average time between the beginning of two incidents. This indicator should be as high as possible.

The diagram below shows the scope of these indicators:

../_images/mtbf_mtbsi_mtrs_explanation.png