Skip to main content
Version: 23.10

Operating guide

Unless otherwise stated, all commands in this document must be passed as root.

In this document, we will refer to characteristics that are bound to change from one platform to another (such as IP addresses and host names) by the macros defined here.

Cluster Management​

The following set of commands can be run from any member of the cluster.

Display cluster status​

To view the general state of the cluster, run this command:

crm_mon

Check the "Failed actions" on the resources and troubleshoot them using the troubleshooting guide.

View the status of a resource​

To find out the status of a specific resource, run this command:

pcs resource show <resource_name>

For example, to find out the status of the centengine resource, run this command:

pcs resource show centengine

View cluster configuration​

To view the cluster configuration, run this command:

pcs config show

Test the configuration​

To test the cluster configuration, run this command:

crm_verify -L -V

Save & import configuration​

Export/import in XML format​

To save the cluster configuration in XML format, run this command:

cibadmin --query > /tmp/cluster_configuration.xml

The following commands perform important modifications to the cluster's configuration and might break it. Use them wisely.

After modifying the XML configuration file, reimport it:

cibadmin --replace --xml-file /tmp/cluster_configuration.xml

To completely reset your cluster's configuration, run this command:

cibadmin --force --erase

Export/import in binary format​

The cluster's configuration can be backed up to a binary file:

pcs config backup export

This backup can then be re-imported:

pcs config restore export.tar.bz2

Check the "switchability" of a resource​

To simulate the ability to toggle a resource from one node to another, run this command:

crm_simulate -L -s

Then check the scores displayed.

Resource management​

Switch a resource from one node to another​

To move a resource from the node where it is currently running to the other, run this command:

pcs resource move <resource_name>

Warning: the pcs resource move adds a constraint that will prevent the resource from moving back to the node where it used to be running.

Once the resource is done moving, run this command:

pcs resource clear <resource_name>

Remove an error displayed in the cluster status​

Once the cause of the error has been identified and fixed (troubleshooting guide), you must manually delete the error message:

pcs resource cleanup

Or, if you want to remove only the errors linked to one resource:

pcs resource cleanup <resource_name>

View cluster logs​

The cluster logs are located in /var/log/cluster/corosync.log:

tailf /var/log/cluster/corosync.log

Useful logs can also be found in /var/log/messages.

Change the cluster log verbosity level​

To change the verbosity level of the cluster logs, edit the following files:

  • /etc/sysconfig/pacemaker
  • /etc/rsyslog.d/centreon-cluster.conf

Management of the MariaDB resource​

This chapter discusses the operating procedures for the ms_mysql resource. The procedures are to be performed on the @CENTRAL_MASTER_NAME@ and @CENTRAL_SLAVE_NAME@ servers.

Check the status of MariaDB replication​

Run the following command on one of the database servers:

/usr/share/centreon-ha/bin/mysql-check-status.sh
Connection Status '@CENTRAL_MASTER_NAME@' [OK]
Connection Status '@CENTRAL_SLAVE_NAME@' [OK]
Slave Thread Status [OK]
Position Status [OK]

If errors are displayed on the third or fourth line, it means that the database replication has been broken for some reason. The procedure below explains how to manually re-enable MariaDB replication.

Restore MariaDB master-slave replication​

This procedure should be applied in the event of a breakdown in the MariaDB databases' replication thread or a server crash if it cannot be recovered by running pcs resource cleanup ms_mysql or pcs resource restart ms_mysql.

Prevent the cluster from managing the MariaDB resource during the operation (to be run from any node):

pcs resource unmanage ms_mysql

Connect to the MariaDB slave server and shut down the MariaDB service:

mysqladmin -p shutdown

Connect to the MariaDB master server and run the following command to overwrite the slave's data with the master's:

/usr/share/centreon-ha/bin/mysql-sync-bigdb.sh

Re-enable the cluster to manage the MariaDB resource:

pcs resource manage ms_mysql

Run the following command on one of the database servers to make sure that the replication has been successfully restored:

/usr/share/centreon-ha/bin/mysql-check-status.sh
Connection Status '@CENTRAL_MASTER_NAME@' [OK]
Connection Status '@CENTRAL_SLAVE_NAME@' [OK]
Slave Thread Status [OK]
Position Status [OK]

Reverse the direction of the MariaDB master-slave replication​

Before performing this operation, it is mandatory to make sure that the MariaDB replication thread is running well.

Warning: Following this procedure on a two-node cluster installed using this procedure will move the centreon resource group as well, because it must run on the node that has the ms_mysql-master meta attribute.

To make the resource move from one node to the other, run this command:

pcs resource move ms_mysql-master

This command sets an "-Inf" constraint on the node hosting the resource. As a result, the resource switches to another node.

Wait until all the resources have switched to the other node and then clear the constraint:

pcs resource clear ms_mysql-master

Managing the Centreon resource group​

Toggle the centreon resource group​

Warning: As in this chapter, following this procedure on a two-node cluster installed using this procedure will switch the MariaDB master as well, because it must run on the node that has the ms_mysql-master meta attribute.

Move the resource group to the other node:

pcs resource move centreon

This command sets an "-Inf" constraint on the node hosting the resource. As a result, the resource group switches to another node. Following this manipulation, it is necessary to clear the constraint:

pcs resource clear centreon

Delete a Pacemaker resource group​

Warning: These commands will prevent your Centreon cluster from working. Only do this if you know what you are doing.

Connect to a cluster node and run the following commands:

pcs resource delete centreon             \
cbd_central_broker \
gorgone \
snmptrapd \
centreontrapd \
http \
centreon_central_sync \
vip

If that does not work, it is probably due to a resource in a failed state. Run the following commands to delete the resource:

crm_resource --resource [resource] -D -t primitive -C
pcs resource cleanup centreon

To create the resources again, follow the installation procedure from this point

Monitoring a Centreon-HA cluster​

A high-availability platform is basically a LAMP platform (Linux Apache MariaDB PHP) managed by the ClusterLabs tools. The platform's monitoring must therefore include the same indicators as with any Centreon platform, and some cluster-specific ones. The monitoring of the cluster must be performed from an external poller.

System indicators and processes​

The easiest part consists in monitoring the basic system indicators, mostly using SNMP Protocol, which is made quite simple thanks to the Linux Monitoring Connector.

  • System metrics
    • LOAD Average
    • CPU usage
    • Memory usage
    • SWAP usage
    • File systems usage
    • Networking traffic
    • NTP synchronization with the reference time server
  • Processes
    • System processes crond, ntpd, rsyslogd
    • Centreon processes gorgoned, centengine, centreontrapd, httpd24-httpd, snmptrapd, mysqld, php-fpm

Application monitoring​

Cluster monitoring​

The cluster-specific health checks can be monitored using the Pacemaker Monitoring Connector:

  • Resource constraints: only for ms_mysql and centreon resources
  • Failed actions

Note: a Monitoring Connector dedicated to Centreon-HA might be released in the future to make this easier.