Concepts
Availability & eventsβ
Availability and event data relate to changes in the state of hosts and services. These data are used in report designs and by the event and availability widgets included in Centreon MBI.
There are no special system prerequisites for creating the reports that use this information, other than a plugin that returns the state. This section describes basic concepts and calculations needed for using and analyzing Centreon MBI reports.
Availabilityβ
Hostsβ
A host is considered available when its state is "Up".
To calculate the availability rate the formula is: "Up" duration / ("Up" + "Down" durations)
Additional rules:
- Time spent in the "Unreachable" state is not considered in the calculation of availability.
- Time spent in "Planned Downtime" is not considered in the calculation of availability.
Example: For a report covering one day, if a host is available 23 hours and unavailable 1 hour out of a 24 hour-period, its availability will be 23 hours / (23 + 1) ~ 95.8%.
Servicesβ
A service is considered available when its state is "OK" or "Warning".
To calculate the availability rate the formula is: ("OK" + "Warning" durations) / ("OK" + "Warning" + "Critical" durations)
Additional rules :
- Time spent in the "Unknown" state is not considered in the calculation of the availability.
- Time spent in "Planned Downtime" is not considered in the calculation of availability.
Eventsβ
Only validated events are considered in the calculation of events. This corresponds to the "HARD" state in Centreon.
In the reports, several message types correspond to different states:
- Exception: Denotes a "Down" state for a host and a "Critical" state for a service.
- Warning: Denotes a "Warning" state for the services, but there is no equivalent for hosts.
- Information: Any other state.
An event on a host or service is characterized by three values:
- A start date
- A end date
- A state.
Additional indicatorsβ
- MTRS (Mean Time To Restore Service) pertains to maintainability: Average duration of the failure. This indicator should be as low as possible.
- MTBF (Mean Time Between Failure) pertains to reliability: Average time between the end of an incident and the beginning of the next. This indicator should be as high as possible.
- MTBSI (Mean Time Between Service Incident): Average time between the beginning of two incidents. This indicator should be as high as possible.
The diagram below shows the scope of these indicators:
Best practicesβ
Best practices for monitoringβ
Quality of plugins, performance and capacity dataβ
To obtain reporting on performance data using default Centreon MBI reports, you should monitor at least some basic performance indicators (metrics):
-
CPU -- Should return a percentage value, using one or more metrics (cpu_total, cpu_sys, cpu_1, etc.), with 100 as the maximum value.
-
Memory should return at least one metric with this information:
-
Memory usage: the value expressed in bytes only
-
Memory usage warning threshold
-
Memory usage critical threshold
-
Total allocated memory in Bytes.
The plugin for monitoring this indicator must return the following output:
status information | metric_name=valueunit;warning_threshold;critical_threshold;min_value;max_value
-
Storage usage -- Two possible kinds of service:
-
Monitoring one partition by service (metrics are often designated as "used" and "size")
-
Monitoring multiple partitions by service and each metric corresponds to a partition name.
In this two cases, the performance data returned by storage plugins have to correspond to this format:
status information | metric_name=valueunit;warning_threshold;critical_threshold;min_value;max_value metric_name_2=value...
-
Traffic -- Standard traffic reports use two metrics in parameters, one for the inbound traffic and one for the outbound. For compatibility, your plugins must return two metrics, although their names do not matter. For each metric, the maximum possible value must be specified. This is the recommended plugin format:
any status information | $inboundTrafic=$value$unit;$warning_threshold;$critical_threshold;$min_value;$max_value $outBoundTrafic=...
Using the Centreon Monitoring Connectors guarantees the quality of your data.
Default unitsβ
Be sure that the data sent by the plugins is expressed in the same units as similar data used with other services. We strongly recommend verifying that the plugins use these units:
- Time: seconds
- Traffic: bits/sec
- Storage: bytes
- Memory/Swap: bytes
Using the Centreon Monitoring Connectors guarantees the quality of your data.
Best practices for configuring Centreon objectsβ
In Centreon MBI, each report design has several parameters that allow you to generate customized documents according to your business requirements.
Parameter types can be:
-
A main object on which the report will be generated, such as:
- A host
- A host group: functional group defined in Centreon to classify hosts by customer, application, business unit, country, etc.
- Several host groups.
-
A time period (or "Business hours" also called "Live service") for which statistics will be calculated.
-
Filters that take into account only specific types of hardware, services and metrics from selected host groups:
-
Host categories: Classifies hosts into technical groups for determining the type or technical function of a host (e.g., Linux servers, Windows servers, Cisco routers, printers).
-
Service categories: Defines the type of service (e.g., CPU, physical memory, storage).
-
Metrics: Indicates performance data collected by the services (monitoring indicators). One monitoring service can collect several metrics. However metric names and units are not standardized. For instance, one CPU-type service can collect only a metric called "cpu_average" defined in percentages, and another CPU-type service can collect a metric by CPU core configured in the hardware. Therefore, when generating a report, it is essential to select the metrics used in the statistic calculation.
Host groups and categoriesβ
The definitions of host groups and categories listed in the previous chapter comply with the best practices established by Centreon.
However, the groups and categories that you create should correspond to your business requirements.
Example:
If you need to report the number of alerts generated by IT field, with a detailed distribution by type of hardware, you will have to define the host groups and categories using this method:
- Host groups: Databases, Applications, Security, Network, Mail, etc.
- Host categories: DB2-Servers, MySQL-Servers, Oracle-Servers, SQL-Servers, etc.
Here is an exemple of statistics that you can obtain using those groups and categories:
The host group is the first analysis axis. The host category allows you to analyze the statistics in subsets.
In the same way, we can analyze the statistics using the following dimensions:
- By country (host group) with a data distribution by type of network hardware (host category)
- By country (host group) with a data distribution by customer (host category)
- By customer (host group) with a data distribution by country (host category)
- By customer (host group) with a data distribution by application server (host category)
There is no standard set of rules for defining host groups and categories. They must correspond to your reporting needs.
Creating categories and groups
- You associate hosts to host groups in the Configuration > Hosts > Host groups menu on the Centreon interface. You can also use the Tab Relations in the host add/modification form.
- You associate hosts and host categories in the menu Configuration > Hosts > Categories. You can also use the Tab Relations in the host add/modification form.
Service categoriesβ
Service categories allow you to organize services (monitoring indicators) into subsets. The most common usage of service categories is for defining categories based on service types, e.g., CPU, physical memory, storage, Process-Oracle, DNS, Process-WebSphere.
This type of configuration lets you:
- Compare the number of alerts generated by each type of service.
- Select the service category that indicates storage usage information when you need to generate a capacity report.
Like the host groups and categories, service categories must be defined according to your reporting needs.
For instance, if you need to analyze the storage space allocated and used by DBMS or an application type, you may need to create several service categories. Instead of using only one service category named "Storage" or "Disk" you could create these service categories:
- "Operating system"
- "Oracle"
- "SQL Server"
Here is an exemple of statistics that you can obtain using these service categories:
You associate services and service categories in the Configuration > Services > Categories menu in the Centreon Interface. You can also use the Relation tab in the add/modification form of a given service.
For managing service categories, we highly recommand that you only use the service templates.
Extract, Transform, Load (ETL)β
Change historyβ
Centreon MBI logs every change affecting the relationships between, hosts, services, groups and categories.
Example:
- Host "H1" is related to host group "G1" in January.
- Host "H1" no longer belongs to group "G1" as of February 1.
- After this change, if a report is generated for group "G1" over the reporting period of January, the statistics of host "H1" will be considered in the statistics of group "G1".
- The statistics of host "H1" will not be considered for group "G1" if the reporting period selected is February.
- If the reporting periods starts on January 15 and end on February 15, the statistics of host "H1" will be considered for the statistics of group "G1" only from January 15 to January 31.
The initial Centreon setup and the relationship between objects must be clearly defined before installing Centreon MBI on a production platform. Any change to the configuration of a host, group or category is considered to be a normal part of their lifecycle.
Execution modesβ
The Centreon MBI reporting database or "data warehouse" is updated every day with aggregated data calculated by the ETL, processing potentially millions of lines of data on your platform. For this reason the ETL and data warehouse play a critical role that should be understood.
The ETL operates in two modes:
-
Daily mode: When up and running, Centreon MBI reporting platform normally functions in this mode. Centreon data is imported daily into the reporting database, but incrementally. More specifically:
- data_bin is imported incrementally and aggregations are calculated only for the previous day.
- the entire hoststatevents and servicestatevents tables are imported but events are calculated incrementally.
The update process can take a matter of seconds up to several minutes depending on the size of your monitored environment. The daily mode is configured in a crontab file located in
/etc/cron.d/centreon-bi-engine
:30 4 * * * root /usr/share/centreon-bi/bin/centreonBIETL --daily >> /var/log/centreon-bi/centreonBIETL.log 2>&1
Warning
Executing this script multiple times on the same day will cause duplication problems.
-
Rebuild mode: After installation of the Centreon MBI platform, this mode if often used in case of data corruption. You can import and calculate statistics over a defined period, or by using the retention parameters.
Example:
/usr/share/centreon-bi/bin/centreonBIETL -r
To obtain acceptable execution times and manage all the data generated by your Centreon platform, the hardware configuration, storage capacity and MariaDB optimizations are three important points to consider when installing Centreon MBI. For recommendations, consult the online documentation in the Architecture & Pre-requisites chapters.
We advise you to monitor the reporting database using the dedicated Monitoring Connector. If the ETL does not work for several days or the raw data is not up to date, you must perform a rebuild for the missing days. The ETL does not automatically reimport and calculate the missing days. Do not hesitate to contact the Centreon support team for assistance.
Execution Optionsβ
Various parameter options can be passed to the script to execute specific actions:
-c Create the reporting database model. -d Daily execution to calculate statistics on yesterday. -r Rebuild mode to calculate statitics on a historical period. Can be used with: Extra arguments for options -d and -r (if none of the following is specified, these one are selected by default: -IDEP): -I Extract data from the monitoring server. Extra arguments for option -I: -C Extract only Centreon configuration database only. Works with option -I. -i Ignore perfdata extraction from monitoring server. -o Extract only perfdata from monitoring server.
-D Calculate dimensions. -E Calculate event and availability statistics. -P Calculate perfdata statistics. Common options for -rIDEP: -s Start date in format YYYY-MM-DD. By default, the program uses the data retention period from Centreon MBI configuration. -e End date in format YYYY-MM-DD. By default, the program uses the data retention period from Centreon MBI configuration. -p Do not empty statistic tables, delete only entries for the processed period. Does not work on raw data tables, only on Centreon MBI statistics tables.
If no "start" or "end" date is given to the ETL script, the start and end date are automatically calculated using the retention parameters configured on the interface in General Option > Data retention Parameter.
Performanceβ
If ETL processing seems too long in daily or rebuild mode, you should consider optimizing your reporting server by:
- Optimizing the MariaDB configuration.
- Storing the database on a high-performance disk (e.g., with no i/o wait time).
- Adding more physical memory (+ optimize configuration).
- Not sharing storage or the database with other applications.
Purgeβ
Data purging can be activated in Centreon MBI General Options to ensure
that the database complies with the retention configuration. You
activate this function through the interface AND in the following cron
on the reporting server: /etc/cron.d/centreon-bi-purge
.
Reporting dimensions (combination of groups/host categories/services/metrics) with no relating data are automatically deleted from the reporting database.
How to apply a new configuration to historical dataβ
This procedure deletes all previously calculated data (and links between objects) and recalculates data based on the retention period in the latest Centreon configuration.
When implementing Centreon reporting, you may expect to re-execute your statistical calculations a number of times if the Centreon configuration changes. When you have finished making changes to groups and categories, you can run the commands below on the REPORTING server, preferably in the morning (due to several hours of potential processing time). This procedure does not include the importing of logs or raw data. Make sure all data imported from Centreon is up to date on your reporting server by running the following command:
/usr/share/centreon-bi/etl/centreonbiMonitoring.pl --db-content
And make sure "ETL OK - Database is up to date" appears OR that the following tables are not listed:
- data_bin
- hoststatevents
- servicestateevents
If you run the following set of ETL commands without any start and end parameters, the calculations will be based on the retention parameter defined in the Centreon MBI > General Option > Data retention tab menu. If you are currently installing or testing the product, you may consider reducing the retention time BEFORE processing and restore the default value (365 days) AFTER processing. This will help speed up calculation time.
Import the latest Centreon configurationβ
/usr/share/centreon-bi/etl/importData.pl -r --centreon-only
Calculate reporting dimensionsβ
This command will erase all previous changes tracked by the reporting mecanism and include only the latest. If you want to include former changes, replace the -r by -d::
/usr/share/centreon-bi/etl/dimensionsBuilder.pl -r
Aggregate events and availabilityβ
nohup /usr/share/centreon-bi/etl/eventStatisticsBuilder.pl -r > /var/log/centreon-bi/rebuildAllEvents.log &
Aggreggate performance data (storage, traffic, etc.)β
nohup /usr/share/centreon-bi/etl/perfdataStatisticsBuilder.pl -r > /var/log/centreon-bi/rebuildAllPerf.log &
How to rebuild missing reporting dataβ
You may require this procedure when the monitoring plugin of your reporting server returns a state other than "OK". This may appear during daily processing (e.g., data is not up to date, there is insufficient space on the reporting server, or processing was manually interrupted).
The plugin may return a message that your database is not up to date, as in the following example: :
/usr/share/centreon-bi/etl/centreonbiMonitoring.pl --db-content
[Table mod_bam_reporting, last entry: 2020-07-01 00:00:00] [Table mod_bi_ba_incidents, last entry: 2020-07-01 00:00:00] [Table hoststateevents, last entry: 2020-07-01 00:00:00]
[Table servicestateevents, last entry: 2020-07-01 00:00:00] [Table mod_bi_hoststateevents, last entry: 2020-07-01 00:00:00]
[Table mod_bi_servicestateevents, last entry: 2020-07-01 00:00:00] [Table mod_bi_hostavailability, last entry: 2020-07-01 00:00:00]
[Table mod_bi_serviceavailability, last entry: 2020-07-01 00:00:00] [Table data_bin, last entry: 2020-08-01 00:00:00] [Table mod_bi_metricdailyvalue, last entry: 2020-08-01 00:00:00]
[Table mod_bi_metrichourlyvalue, last entry: 2020-08-01 23:00:00]
-
When only the
mod_bi
tables appear, there is an incident with aggregated data and not the Centreon data.In this case, skip the "Import Missing data" section below.
-
If the following tables appear, a problem has occurred with the raw data imported from Centreon:
-
hoststatevents
-
servicestateevents
-
All the mod_bam_reporting* tables
-
data_bin.
First resolve any incidents on the Centreon side before executing the procedure below.
Prerequisitesβ
Before running the commands in the procedure below, check that:
- The Centreon platform is up and running, and data is up to date.
- The daily cron centreonBIETL is not enabled (it should be commented out) on the reporting server in the file /etc/cron.d/centreon-bi-engine. It must be enabled at the end of the procedure.
- The script dataRetentionManager.pl is not enabled (also commented out) on the reporting server in the file /etc/cron.d/centreon-bi-purge. It must be enabled at the end of the procedure.
- Retention is enabled on the interface.
- Retention is configured for no more than 1024 days.
- The scripts in
/etc/cron.d/centreon-bi-backup-reporting-server
are not enabled (commented out). They must be enabled at the end of the procedure.
For the following commands, we advise you to use "screen" or "nohup" to prevent disconnection due to timeout. And you have to manually replace the following elements:
- $date_start$ should be replaced according to the data you want to retrieve (based on retention or starting point of missing data)
- $date_end$ most of the time corresponds to the "today" date
Import the missing dataβ
- Import the data, without the performance data (data_bin table),
from a specific date according to the Availability retention period
you defined in
Centreon MBI > Generation Option > Data Retention Parameters
:
nohup /usr/share/centreon-bi/etl/importData.pl -r -s $date_start$-e $date_end$ --ignore-databin > /var/log/centreon-bi/rebuild_importDataEvents.log &
Execution time: fast (minutes)
- Import the data from data_bin, starting from the date the last data was present in the database. You will find that date next to the data_bin table, returned by the plugin: :
nohup /usr/share/centreon-bi/etl/importData.pl -r --no-purge --databin-only -s $date_start$ -e $date_end$ > /var/log/centreon-bi/rebuild_importDataBin.log &
Execution time: fast (minutes), depending on the number of days imported.
Update reporting dimensionsβ
- Update the dimensions. Using the "-d" option keeps the history of changes made in the configuration. Avoid using the "-r" option or you will have to rebuild all statistics: :
nohup /usr/share/centreon-bi/etl/dimensionsBuilder.pl -d > /var/log/centreon-bi/rebuild_dimensions.log &
Execution time: fast (seconds or minutes)
Rebuild missing events and availability dataβ
- Rebuild events from a specific date according to the retention period defined in Centreon MBI > Generation Option > Data Retention Parameters: :
nohup /usr/share/centreon-bi/etl/eventStatisticsBuilder.pl -r --events-only > /var/log/centreon-bi/rebuild_events.log &
Execution time: Depending on the monitoring perimeter and the number of events: several hours but normally not be longer than 24 hours. In excess of this limit, please contact the Centreon support team.
- Rebuild the availability tables, starting from the day where the last data was present. Check the mod_bi_hostavailability and mod_bi_serviceavailability date returned by the plugin for the lastest build data:
nohup /usr/share/centreon-bi/etl/eventStatisticsBuilder.pl -r --no-purge --availability-only -s $date_start$ -e $date_end$ > /var/log/centreon-bi/rebuild_availability.log &
Execution time: From a few minutes to several hours, depending on the number of days of rebuild.
Rebuild the missing performance dataβ
- Rebuild the missing performance statistics. Check the earliest date next to the mod_bi_metrichourlyvalue and mod_bi_metricdailyvalue tables returned by the plugin for the last data calculated: :
nohup /usr/share/centreon-bi/etl/perfdataStatisticsBuilder.pl -r --no-purge -s $date_start$ -e $date_end$ > /var/log/centreon-bi/rebuild_perfData.log &
Execution time: From a few minutes to several hours, depending on the number of days to calculate. If the number of days of rebuild is greater than the hourly retention setting, the amount of data generated may be voluminous and the rebuild time long.
What to do after executing the scriptsβ
-
Case 1 : The rebuild is performed on the same day
Uncomment the lines in
/etc/cron.d/centreon-bi-engine
and/etc/cron.d/centreon-bi-purge
and restart the cron service:systemctl restart crond restart
-
Case 2 : The rebuild finishes the next day
-
Uncomment the lines in
/etc/cron.d/centreon-bi-engine
and/etc/cron.d/centreon-bi-purge
and restart the cron service:systemctl restart crond restart
-
Manually execute the daily script:
/usr/share/centreon-bi/bin/centreonBIETL -d
-
-
Case 3 : In other cases: Follow the procedure of partial rebuild for the missing days.
Example: The rebuild took 4 days, from January 1 to January 4: You need to follow the procedure from the beginning and use date_start = 01-01 and date_end = 04/01 The procedure is over, the output of the BI monitoring plugin should be βETL execution OK, database is up-to-dateβ.
Centreon BAM statisticsβ
-
Follow this procedure if you rebuilt the Centreon BAM statistics.
- Reimport the BAM data from the central server to the reporting server by running the following command:
/usr/share/centreon-bi/etl/importData.pl -r --bam-only
This will import all Centreon BAM reporting tables.
-
If statistics are not up to date, follow this procedure:
- You need first to execute the following command to rebuild statistics on the central server:
/usr/share/centreon/www/modules/centreon-bam-server/engine/centreon-bam-rebuild-events --all
- Then import them using this command on the reporting server:
/usr/share/centreon-bi/etl/importData.pl -r --bam-only
How to rebuild only Centile statisticsβ
To use the "Monthly Network Percentile" report you must activate centile calculation and storage. Go to the Reporting > Business Intelligence > General Options | ETL Tab page and configure the subsection "Centile parameters" as described below to create a relevant centile/timeperiod combination(s). If this report is not required, simple leave the default values.
Parameter | Value |
---|---|
Calculate centile aggregation | Monthly (minimum) |
Select service categories to aggregate centile on | Select at least one traffic service category |
Set first day of the week | Monday (default) |
Create the required centile-time period combination(s) | Create at least one combination, e.g. 99.0000 - 24x7 |
See example in the screenshot below:
Only service categories selected in "Reporting perimeter selection" will appear in the list of service categories available for centile statistics.
You can create as many centile-time period combinations as you like, but be advised that this may increase calculation time. Start with a small number of parameter combinations to determine the impact on calculation time.
On the reporting server, run the following command to import the configuration data:
/usr/share/centreon-bi/bin/centreonBIETL -rIC
Then, run the following command to update the centile configuration in the datawarehouse:
/usr/share/centreon-bi/etl/dimensionsBuilder.pl -d
Finally, run the following command to calculate only the centile statistics:
/usr/share/centreon-bi/etl/perfdataStatisticsBuilder.pl -r --centile-only