Operating guide
Unless otherwise stated, all commands in this document must be passed as
root
.
In this document, we will refer to characteristics that are bound to change from one platform to another (such as IP addresses and host names) by the macros defined here.
Cluster Managementβ
The following set of commands can be run from any member of the cluster.
Display cluster statusβ
To view the general state of the cluster, run this command:
crm_mon
Check the "Failed actions" on the resources and troubleshoot them using the troubleshooting guide.
View the status of a resourceβ
To find out the status of a specific resource, run this command:
pcs resource show <resource_name>
For example, to find out the status of the centengine resource, run this command:
pcs resource show centengine
View cluster configurationβ
To view the cluster configuration, run this command:
pcs config show
Test the configurationβ
To test the cluster configuration, run this command:
crm_verify -L -V
Save & import configurationβ
Export/import in XML formatβ
To save the cluster configuration in XML format, run this command:
cibadmin --query > /tmp/cluster_configuration.xml
The following commands perform important modifications to the cluster's configuration and might break it. Use them wisely.
After modifying the XML configuration file, reimport it:
cibadmin --replace --xml-file /tmp/cluster_configuration.xml
To completely reset your cluster's configuration, run this command:
cibadmin --force --erase
Export/import in binary formatβ
The cluster's configuration can be backed up to a binary file:
pcs config backup export
This backup can then be re-imported:
pcs config restore export.tar.bz2
Check the "switchability" of a resourceβ
To simulate the ability to toggle a resource from one node to another, run this command:
crm_simulate -L -s
Then check the scores displayed.
Resource managementβ
Switch a resource from one node to anotherβ
To move a resource from the node where it is currently running to the other, run this command:
pcs resource move <resource_name>
Warning: the
pcs resource move
adds a constraint that will prevent the resource from moving back to the node where it used to be running.
Once the resource is done moving, run this command:
pcs resource clear <resource_name>
Remove an error displayed in the cluster statusβ
Once the cause of the error has been identified and fixed (troubleshooting guide), you must manually delete the error message:
pcs resource cleanup
Or, if you want to remove only the errors linked to one resource:
pcs resource cleanup <resource_name>
View cluster logsβ
The cluster logs are located in /var/log/cluster/corosync.log
:
tailf /var/log/cluster/corosync.log
Useful logs can also be found in /var/log/messages
.
Change the cluster log verbosity levelβ
To change the verbosity level of the cluster logs, edit the following files:
/etc/sysconfig/pacemaker
/etc/rsyslog.d/centreon-cluster.conf
Management of the MariaDB resourceβ
This chapter discusses the operating procedures for the ms_mysql
resource. The procedures are to be performed on the @CENTRAL_MASTER_NAME@
and @CENTRAL_SLAVE_NAME@
servers.
Check the status of MariaDB replicationβ
Run the following command on one of the database servers:
/usr/share/centreon-ha/bin/mysql-check-status.sh
Connection Status '@CENTRAL_MASTER_NAME@' [OK]
Connection Status '@CENTRAL_SLAVE_NAME@' [OK]
Slave Thread Status [OK]
Position Status [OK]
If errors are displayed on the third or fourth line, it means that the database replication has been broken for some reason. The procedure below explains how to manually re-enable MariaDB replication.
Restore MariaDB master-slave replicationβ
This procedure should be applied in the event of a breakdown in the MariaDB databases' replication thread or a server crash if it cannot be recovered by running
pcs resource cleanup ms_mysql
orpcs resource restart ms_mysql
.
Prevent the cluster from managing the MariaDB resource during the operation (to be run from any node):
pcs resource unmanage ms_mysql
Connect to the MariaDB slave server and shut down the MariaDB service:
mysqladmin -p shutdown
Connect to the MariaDB master server and run the following command to overwrite the slave's data with the master's:
/usr/share/centreon-ha/bin/mysql-sync-bigdb.sh
Re-enable the cluster to manage the MariaDB resource:
pcs resource manage ms_mysql
Run the following command on one of the database servers to make sure that the replication has been successfully restored:
/usr/share/centreon-ha/bin/mysql-check-status.sh
Connection Status '@CENTRAL_MASTER_NAME@' [OK]
Connection Status '@CENTRAL_SLAVE_NAME@' [OK]
Slave Thread Status [OK]
Position Status [OK]
Reverse the direction of the MariaDB master-slave replicationβ
Before performing this operation, it is mandatory to make sure that the MariaDB replication thread is running well.
Warning: Following this procedure on a two-node cluster installed using this procedure will move the
centreon
resource group as well, because it must run on the node that has thems_mysql-master
meta attribute.
To make the resource move from one node to the other, run this command:
pcs resource move ms_mysql-master
This command sets an "-Inf" constraint on the node hosting the resource. As a result, the resource switches to another node.
Wait until all the resources have switched to the other node and then clear the constraint:
pcs resource clear ms_mysql-master
Managing the Centreon resource groupβ
Toggle the centreon
resource groupβ
Warning: As in this chapter, following this procedure on a two-node cluster installed using this procedure will switch the MariaDB master as well, because it must run on the node that has the
ms_mysql-master
meta attribute.
Move the resource group to the other node:
pcs resource move centreon
This command sets an "-Inf" constraint on the node hosting the resource. As a result, the resource group switches to another node. Following this manipulation, it is necessary to clear the constraint:
pcs resource clear centreon
Delete a Pacemaker resource groupβ
Warning: These commands will prevent your Centreon cluster from working. Only do this if you know what you are doing.
Connect to a cluster node and run the following commands:
pcs resource delete centreon \
cbd_central_broker \
gorgone \
snmptrapd \
centreontrapd \
http \
centreon_central_sync \
vip
If that does not work, it is probably due to a resource in a failed state. Run the following commands to delete the resource:
crm_resource --resource [resource] -D -t primitive -C
pcs resource cleanup centreon
To create the resources again, follow the installation procedure from this point
Monitoring a Centreon-HA clusterβ
A high-availability platform is basically a LAMP platform (Linux Apache MariaDB PHP) managed by the ClusterLabs tools. The platform's monitoring must therefore include the same indicators as with any Centreon platform, and some cluster-specific ones. The monitoring of the cluster must be performed from an external poller.
System indicators and processesβ
The easiest part consists in monitoring the basic system indicators, mostly using SNMP Protocol, which is made quite simple thanks to the Linux Monitoring Connector.
- System metrics
- LOAD Average
- CPU usage
- Memory usage
- SWAP usage
- File systems usage
- Networking traffic
- NTP synchronization with the reference time server
- Processes
- System processes
crond
,ntpd
,rsyslogd
- Centreon processes
gorgoned
,centengine
,centreontrapd
,httpd24-httpd
,snmptrapd
,mysqld
,php-fpm
- System processes
Application monitoringβ
- Control access to the URL
http://@VIP_IPADDR@/centreon
using the HTTP Protocol Monitoring Connector - MariaDB, using the MySQL/MariaDB Database Monitoring Connector
- MariaDB Server Connection Control
- MariaDB / InnoDB buffers and caches
- Index usage
- MariaDB replication
Cluster monitoringβ
The cluster-specific health checks can be monitored using the Pacemaker Monitoring Connector:
- Resource constraints: only for
ms_mysql
andcentreon
resources - Failed actions
Note: a Monitoring Connector dedicated to Centreon-HA might be released in the future to make this easier.