Centreon documentation

Centreon documentation

  • Documentation

›Cloud

Getting Started

  • Installation & first steps
  • Tutorials

    • Introduction
    • Create a custom view
    • Create a graphical view
    • Model your IT services
    • Analyze resources availability

Installation

  • Introduction
  • Prerequisites
  • Architectures
  • Download
  • Installation of a Central server

    • Using Centreon ISO
    • Using packages
    • Using virtual machines (VMs)
    • Using sources
  • Web And Post Installation
  • Installation of a Poller

    • Using Centreon ISO
    • Using packages

    Installation of a Remote server

    • Using Centreon ISO
    • Using packages
  • What is Centreon CEIP?

Secure your platform

  • Secure your platform
  • Secure your MAP platform

Monitoring

  • About Monitoring
  • Generic actions
  • Basic Objects

    • Macros
    • Commands
    • Time periods
    • Contacts
    • Hosts
    • Services
    • Meta Services
  • Templates
  • Plugin Packs
  • Monitoring Servers

    • Add a Poller to configuration
    • Add a Remote Server to configuration
    • Communications
    • Deploying a configuration
    • Advanced configuration
  • Groups & Categories
  • Passive Monitoring

    • Enable SNMP Traps
    • Create SNMP Traps definitions
    • Monitoring with SNMP Traps
    • Debug SNMP Traps management
    • Dynamic Service Management
  • Anomaly detection
  • Discovery

    • Introduction
    • Installation
    • Hosts Discovery
    • Services Discovery
    • Administration
  • Auto Remediation
  • Import/Export

Alerts & Notifications

  • Concepts
  • Resources Status
  • Events consoles
  • Manage alerts
  • Notification

    • Concept
    • Configuration
    • Dependencies
    • Escalation
    • Flapping
    • To go further
  • Ticketing
  • Event Logs

Performance graphs

  • Charts managment
  • Graph template
  • Curves
  • Virtual metrics

Service Mapping

  • Introduction to Centreon BAM
  • Guide

    • Manage Business Activities
    • Monitor Business Activities
    • Report Business Activities
    • Settings
    • Widgets

    Administrate

    • Install Centreon BAM extension
    • Update the extension
    • Upgrade the extension
    • Migrate the extension
    • Install on a Remote Server

Graphical views

  • Introduction to Centreon MAP
  • Guide

    • Create a standard view
    • Create a geo view
    • Display views
    • Share a view

    Administrate

    • Install Centreon MAP extension
    • Update the extension
    • Upgrade the extension
    • Migrate the extension
    • Configure
    • Install on a Remote server
    • Advanced configuration
    • Known issues
    • Troubleshooter

Reporting

  • Introduction to Centreon MBI
  • Guide

    • Generate reports
    • Available reports
    • Widgets
    • Configure
    • Concepts
    • Report development

    Administrate

    • Install Centreon MBI extension
    • Update the extension
    • Upgrade the extension
    • Migrate the extension
    • Backup & restore

Administration

    Parameters

    • Centreon UI
    • Monitoring
    • Gorgone
    • LDAP
    • RRDTool
    • Debug
    • Data management
    • Medias
  • Access Control Lists
  • Extensions
  • Database partitioning
  • Centreon HA

    • Architectures
    • Installing a Centreon HA 2-nodes cluster
    • Installing a Centreon HA 4-nodes cluster
    • Monitoring Centreon-HA
    • Operating guide
    • Updating Centreon-HA platform
    • Upgrade from Centreon-Failover to Centreon-HA
    • Troubleshooting guide
  • Backup
  • Knowledge Base
  • Logging configuration changes
  • Platform statistics

Update, Upgrade & Migrate

    Update

    • Update a Centreon 20.10 platform

    Upgrade

    • Introduction to upgrade
    • Upgrade from Centreon 20.04
    • Upgrade from Centreon 19.10
    • Upgrade from Centreon 19.04
    • Upgrade from Centreon 18.10
    • Upgrade from Centreon 3.4

    Migrate

    • Introduction
    • Migrate from a Centreon 20.x platform
    • Migrate from a Centreon 3.4 platform
    • Nagios Reader to Centreon CLAPI
    • Migrate a platform with Poller Display module

Plugin Packs

  • Introduction to Plugin Packs
  • Tutorials

    • Collect OpenMetrics

    Applications

    • 3CX
    • Active Directory API
    • ActiveMQ JMX
    • Alyvix Server
    • Ansible
    • Ansible Tower
    • Antivirus ClamAV
    • Apache Server
    • Asterisk VoIP Server
    • Asterisk VoIP SNMP
    • Bind9 Web
    • BlueMind SSH
    • Cassandra
    • Cisco CMS
    • Cisco ISE
    • Cisco SSMS
    • Commvault CommServe Rest API
    • DRBD SSH
    • Dynatrace Rest API
    • EMC PPMA Rest API
    • Exchange 2010 API
    • Github
    • Github
    • Google Gsuite
    • Haproxy SNMP
    • Hibernate
    • IBM Tivoli Storage M
    • Microsoft DHCP SNMP
    • Microsoft IIS Server Restapi
    • Microsoft IIS Server NSClient API (Deprecated)
    • JBoss Server
    • Jenkins
    • Kafka
    • Kaspersky
    • Keepalived SNMP
    • Lync 2013
    • Maltem Insight Rest API
    • IP-Label datametrie API
    • IP-Label Newtest Rest API
    • McAfee Web Gateway
    • Microsoft Cluster Se
    • Microsoft IIS Server NRPE (Deprecated)
    • Microsoft SCCM
    • Microsoft WSUS
    • MS Active Directory
    • MS Biztalk
    • Graylog
    • MS Exchange 2K10
    • BlueMind
    • Mulesoft Anypoint
    • Netbackup Rest API
    • Netdata RestAPI
    • Nginx Server
    • Nginx Plus Restapi
    • OpenHeadend
    • OpenLDAP
    • OpenMetrics
    • OpenVPN OMI
    • OpenWeatherMap
    • Oracle GoldenGate SSH
    • Oracle VM Manager API
    • Pacemaker
    • Peoplesoft
    • Pfsense Fauxapi
    • PHP APC
    • PHP FPM
    • PVX
    • Quadstor
    • RabbitMQ RestAPI
    • Rapid Recovery SNMP
    • Redis Cli
    • Redis Restapi
    • Rubrik Rest API
    • Rudder
    • Salesforce
    • SAP HANA
    • SCOM Rest API
    • Gorgone Restapi
    • Selenium
    • Sendmail
    • Skype 2015
    • Smartermail Server
    • Solr
    • Squid SNMP
    • Symantec Netbackup
    • Tomcat JMX
    • Tomcat Webmanager
    • TrendMicro Iwsva
    • Varnish NRPE
    • Veeam
    • Veeam API
    • VerneMQ Restapi
    • VMware VCSA RestAPI
    • VTOM
    • Wazuh Rest API
    • Weblogic Server
    • ZIXI
    • Zookeeper

    Centreon

    • Centreon Central
    • Centreon Database
    • Centreon-HA
    • Centreon Map
    • Centreon Map4
    • Centreon MBI
    • Centreon Poller

    Cloud

    • Amazon API Gateway
    • Amazon CloudFront
    • Amazon CloudWatch
    • Amazon CloudWatch Logs
    • Amazon EBS
    • Amazon EC2
    • Amazon ElastiCache
    • Amazon EFS
    • Amazon Kinesis
    • Amazon RDS
    • Amazon S3
    • Amazon SNS
    • Amazon SQS
    • AWS Billing
    • AWS ELB
    • AWS Health
    • AWS Lambda
    • AWS Transit Gateway
    • AWS VPN
    • Amazon SES
    • Azure Automation
    • Azure Elastic Pool
    • Azure Event Grid
    • Azure ExpressRoute
    • Azure Firewall
    • Azure Key Vault
    • Azure Load Balancer
    • Azure Log Analytics
    • Azure Monitor
    • Azure Network Interface
    • Azure Public IP
    • Azure Recovery
    • Azure Resource
    • Azure ServiceBus
    • Azure SignalR
    • Azure SQL Database
    • Azure SQL Server
    • Azure Storage Account
    • Azure Virtual Machine
    • Azure Virtual Network
    • Azure VPN Gateway
    • Google CloudSQL MySQL
    • Google Compute Engine
    • Google Stackdriver
    • Google Storage
    • cAdvisor
    • Cloud Foundry
    • Docker
    • IBM Softlayer
    • Kubernetes API
    • Kubernetes w/ Prometheus
    • Office 365
    • Office365 Exchange
    • Office365 OneDrive
    • Office365 SharePoint
    • Office365 Skype
    • Office365 Teams
    • OVH
    • Prometheus Server
    • Node Exporter
    • VMware VeloCloud

    Database

    • CouchDB Rest API
    • Elasticsearch
    • Elasticsearch (Deprecated)
    • Firebird
    • InfluxDB
    • Informix DB
    • Informix DB SNMP
    • Microsoft SQL Server
    • MongoDB
    • MySQL/MariaDB
    • Oracle Database
    • PostgreSQL DB
    • RRDtool
    • Sybase
    • Warp10 Sensision

    Hardware Server

    • Adder AIM SNMP
    • AEG ACM
    • Avocent ACS 6000
    • Axis Video
    • Cisco Collaboration Endpoint Rest API
    • Cisco UCS
    • Dell CMC
    • Dell iDRAC
    • Dell OpenManage
    • Eltek eNexus SNMP
    • Fujitsu Server SNMP
    • Hanwha camera SNMP
    • Hikvision camera SNMP
    • HMS Ewon SNMP
    • Timelinkmicro Tms6001
    • HP Blade Chassis
    • HP Ilo Rest API
    • HP Ilo XMLAPI
    • HP OneView Rest API
    • HP Proliant
    • Huawei HMM
    • Huawei iBMC
    • IBM BladeCenter
    • IBM HMC SSH
    • IBM IMM
    • Lenovo XCC SNMP
    • Cisco Telepresence System SNMP
    • Masterclock NTP100GP
    • Pexip Infinity ManagementAPI
    • Polycom GroupSeries SNMP
    • Polycom Trio Rest API
    • Safenet Keysecure
    • Sun MgmtCard
    • Sun Mseries
    • Sun SFxxK
    • Supermicro

    Network

    • 3com Network
    • A10 AX
    • Acme Packet
    • Adva FSP 150 SNMP
    • Adva FSP 3000 SNMP
    • Aerohive
    • Alcatel Omniswitch
    • Allied Telesis SNMP
    • Alvarion BreezeACCESS SNMP
    • Arista Switch
    • Arkoon
    • Aruba Instant SNMP
    • Aruba Standard
    • Atrica Routeur
    • Athonet ePC SNMP
    • Atto Fibrebridge SNMP
    • Barracuda Cloudgen SNMP
    • Bee Ware
    • BGP Protocol SNMP
    • Bluecoat generic
    • Brocade Switch
    • CheckPoint firewall
    • Cisco Apic
    • Cisco ASA
    • Cisco Call Manager
    • Cisco Callmanager SXML
    • Cisco ESA XMLAPI
    • Cisco Firepower Management Console Rest API
    • Cisco Firepower SNMP
    • Cisco IronPort
    • Cisco Meraki Rest API
    • Cisco Meraki
    • Cisco Prime
    • Cisco Small Business
    • Cisco Standard
    • Cisco Standard SSH
    • Cisco VCS
    • Cisco Voice Gateway
    • Cisco Waas
    • Cisco WLC
    • Citrix Acceleration
    • Citrix Netscaler
    • Citrix SDX
    • Colubris SNMP
    • Cyberoam
    • D-Link DGS 3100
    • D-Link standard SNMP
    • Dell 6200
    • Dell 6200 SNMP
    • Dell N4000
    • Dell OS10 SNMP
    • Dell S-series
    • DenyAll SNMP
    • Dell Xseries
    • Digi Anywhere USB
    • Digi PortServers TS
    • Digi PortServers TS
    • Digi Sarian
    • Efficienti IP
    • Evertz FC7800
    • Extreme Network
    • F5 BigIP
    • Lenovo Flex System Switch
    • Fiberstore SNMP
    • Fortinet FortiAuthenticator SNMP
    • Fortinet Fortigate
    • Fortinet Fortimanage
    • Freebox
    • FritzBox
    • Gorgy NTP Server
    • H3C Network
    • Hirschmann switch
    • HP Procurve
    • HP Standard Network
    • HP Virtual Connect
    • Huawei
    • Infoblox SNMP
    • Juniper EX Series
    • Juniper GGSN
    • Juniper ISG
    • Juniper M-Series
    • Juniper Mag
    • Juniper SA
    • Juniper SRX
    • Juniper SSG
    • Juniper Trapeze
    • Kemp Loadbalancer
    • Meru SNMP
    • Mikrotik SNMP
    • Mitel 3300ICP
    • Moxa Switch
    • Mrv Optiswitch
    • NetASQ Network
    • Netgear MSeries
    • Netscaler MPX 8000
    • Nokia TiMos
    • Nortel Standard
    • Omniswitch 6850
    • OneAccess Network
    • Oracle Infiniband
    • Palo Alto firewall SNMP
    • Palo Alto firewall SSH
    • Peplink Balance
    • Peplink Pepwave SNMP
    • Perle IDS SNMP
    • pfSense
    • Rad Airmux SNMP
    • Radware Alteon
    • Raisecom
    • RedBack Router
    • Riverbed Interceptor
    • Riverbed SteelHead
    • Ruckus
    • Ruckus Zonedirector
    • Ruckus ICX
    • Ruckus SCG
    • Ruckus Smartzone
    • Ruggedcom Network
    • Silverpeak
    • Sonicwall
    • Sophos ES
    • Stonesoft
    • Stormshield SNMP
    • Stormshield SSH
    • Teltonika SNMP
    • Ubiquiti AirFiber SNMP
    • Traffic Director
    • Ucopia
    • Watchguard
    • Zyxel
    • Versa SNMP
    • Versa Director Restapi

    Operating System

    • AIX SNMP
    • Base Pack
    • FreeBSD SNMP
    • HP-UX
    • IBM AS400
    • Linux NRPE
    • Linux NRPE3
    • Linux SNMP
    • Linux SSH
    • Mac SNMP
    • Solaris SNMP
    • Windows NRPE
    • Windows NRPE 0.5
    • Windows NSClient API
    • Windows SNMP

    Printer

    • Printer standard

    Protocol

    • BGP Protocol
    • DHCP Server
    • DNS Service
    • FTP Server
    • Generic SNMP
    • HTTP Server
    • IMAP Server
    • JMX value
    • LDAP Server
    • Modbus
    • NTP Server
    • OSPF Protocol
    • POP Server
    • Protocol DHCP
    • Protocol SSH
    • Protocol TCP
    • Protocol UDP
    • Radius Service
    • SMTP Server
    • Telnet Scenario
    • TFTP Server
    • X509 Certificat

    Sensor

    • ABB CMS-700
    • AKCP Sensor
    • Geist p8000 sensor SNMP
    • Geist Sensor SNMP
    • HWg-STE Sensor
    • Jacarta Sensor
    • LM Sensors
    • Netbotz Sensor
    • Sensor IP
    • SensorGateway
    • Sensormetrix

    Storage

    • Adic Tape SNMP
    • Avid Isis
    • Buffalo TeraStation SNMP
    • Dell Compellent
    • Dell Compellent API
    • Dell Equallogic
    • Dell FluidFS
    • Dell MD3000
    • Dell Me4 Rest API
    • Dell ML6000
    • Dell TL2000
    • EMC Celerra
    • EMC Clariion
    • EMC Data Domain
    • EMC Isilon
    • EMC RecoveryPoint
    • EMC Symmetrix API
    • EMC Symmetrix NRPE
    • EMC Unisphere Rest API
    • EMC Vplex
    • EMC Xtremio
    • Exagrid
    • Fujitsu Eternus DX
    • Hitachi HCP SNMP
    • Hitachi NAS
    • Hitachi Standard
    • HP 3PAR 7000
    • HP 3PAR SSH
    • HP EVA
    • HP Lefthand
    • HP MSA2000
    • HP MSL
    • HP P2000
    • HP StoreOnce
    • HP StoreOnce SSH
    • IBM DS3000
    • IBM DS4000
    • IBM DS5000
    • IBM FlashSystem 900
    • IBM Storwize
    • IBM TS2900
    • IBM TS3100
    • IBM TS3200
    • IBM TS3500
    • Kaminario RestAPI
    • Lenovo S Series
    • NetApp Ontap OnCommand API
    • NetApp Ontap Rest API
    • NetApp Ontap SNMP
    • Netapp Santricity Restapi
    • Netgear Readynas SNMP
    • Nimble Storage
    • Nimble Storage Rest API
    • Oracle ZFS
    • Oracle ZS
    • Overland Neo
    • Panzura
    • Pure Storage RestAPI
    • Qnap
    • QSAN NAS
    • Quantum DXi Series
    • Quantum Scalar
    • Storagetek SL
    • Synology
    • Violin Memory 3000

    Toip Voip

    • Alcatel OXE
    • Asterisk VoIP Server
    • AudioCodes
    • Avaya AES SNMP
    • Avaya Media Gateway SNMP
    • Polycom DMA SNMP
    • Polycom HDX SNMP
    • Polycom RMX
    • Polycom RPRM SNMP
    • Sonus SBC
    • XiVO VoIP Server

    Ups Pdu

    • Alpha UPS SNMP
    • APC ATS
    • APC PDU
    • APC UPS
    • Clever PDU
    • CyberPower Systems PDU SNMP
    • Eaton ATS SNMP
    • Eaton PDU SNMP
    • Emerson PDU
    • HP UPS SNMP
    • MGE UPS System
    • Nitram UPS SNMP
    • Powerware UPS
    • Raritan PDU
    • Schleifenbauer Gateway SNMP
    • UPS Socomec Net Vision SNMP
    • UPS Standard

    Virtualization

    • Hyper-V 2012
    • Nutanix
    • Proxmox VE
    • VMware ESX
    • VMware ESX WS-MAN
    • VMware vCenter
    • VMware vCenter v4
    • VMware vCenter v5
    • VMware vCenter v6
    • VMware VM

Integrations

    External

    • Accedian PVX Skylight
    • Maltem Insight Performances Rest API

    Notifications

    • Notify with Telegram bot

    Open Tickets

    • BMC Footprints
    • BMC Remedy
    • EasyVista
    • GLPI
    • GLPI RestAPI
    • iTop
    • IWS Isilog
    • Jira
    • Mail
    • OTRS RestAPI
    • Request Tracker RestAPI
    • Serena
    • ServiceNow

    Stream Connectors

    • BSM
    • Elasticsearch events
    • Elasticsearch metrics
    • NDO
    • Opsgenie integration
    • HP OMI
    • PagerDuty Service integration
    • ServiceNow Event Manager
    • ServiceNow MID Server
    • Splunk Metrics
    • Splunk Events
    • Warp10

Mobile App.

  • Introduction

API

  • Introduction
  • Command Line API (v1)
  • Rest API (v1)
  • Rest API (v2)
  • Graphical views API (beta)

Developer resources

  • About developer resources
  • How to write a module
  • How to write a Stream Connector
  • How to translate Centreon
  • How to write a widget
  • Centreon Broker

    • Stream connectors
    • The BBDO protocol
    • Centreon Broker Event Mapping

Releases

  • Centreon Platform 20.10.0
  • Products lifecycle policy
  • Release notes by component

    • Centreon Core
    • Commercial Extensions
    • Open Source Extensions
Edit

Kubernetes API

Overview

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management.

This Pack aims to monitor both infrastructure layer (nodes) and cluster services (deployments, daemonsets, etc).

Pack assets

The Kubernetes API Pack gives multiple choices regarding the way you can arrange a cluster monitoring.

There is mainly three ways:

  • Gather all metrics on only one Centreon host with a service per Kubernetes unit (i.e. deployments, daemonsets, etc) - apply manual creation procedure,
  • Gather all metrics on only one Centreon host with a service for each instances of each Kubernetes units - apply manual creation and service discovery procedures,
  • Collect infrastructural metrics (master and worker nodes) with a Centreon host per Kubernetes node, and keep orchestration/application metrics on a unique host (using one of the 2 previous scenarii) - apply host discovery procedure.

For all those scenarii, discovery and classic templating will be used.

You just need to choose which flavor you like the most: communicating with the RestAPI exposed by the Kubernetes cluster, or using the CLI tool kubectl to communicate with the cluster's control plane.

Discovery

The Kubernetes API Pack comes with several discovery providers and rules.

Here is the list of the Host Discovery providers:

ProviderDescription
Kubernetes Nodes (RestAPI)Discover Kubernetes nodes by requesting Kubernetes RestAPI
Kubernetes Nodes (Kubectl)Discover Kubernetes nodes by requesting Kubernetes cluster using kubectl

Both providers will search for Kubernetes nodes, and link them to a minimal host template to monitor the node usage in terms of pods allocation, cpu and memory requests/limits.

In parallel to this discovery, unitary services can be created thanks to the Service Discovery rules:

RuleDescription
Cloud-Kubernetes-Api-CronJobs-StatusDiscover Kubernetes CronJobs to monitor their status
Cloud-Kubernetes-Api-Daemonsets-StatusDiscover Kubernetes DaemonSets to monitor their status
Cloud-Kubernetes-Api-Deployments-StatusDiscover Kubernetes Deployments to monitor their status
Cloud-Kubernetes-Api-Nodes-StatusDiscover Kubernetes Nodes to monitor their status
Cloud-Kubernetes-Api-Nodes-UsageDiscover Kubernetes Nodes to monitor their usage
Cloud-Kubernetes-Api-PersistentVolumes-StatusDiscover Kubernetes PersistentVolumes to monitor their status
Cloud-Kubernetes-Api-Pods-StatusDiscover Kubernetes Pods to monitor their status
Cloud-Kubernetes-Api-ReplicaSets-StatusDiscover Kubernetes ReplicaSets to monitor their status
Cloud-Kubernetes-Api-ReplicationControllers-StatusDiscover Kubernetes ReplicationControllers to monitor their status
Cloud-Kubernetes-Api-StatefulSets-StatusDiscover Kubernetes StatefulSets to monitor their status

Templates

The Kubernetes API Pack brings 2 different host templates to be used depending on the scenarii mentioned earlier:

  • All in one host template that will gather checks and metrics with a service per Kubernetes unit:

    Cloud-Kubernetes-Api
    Cluster Events
    CronJob Status
    DaemonSet Status
    Deployment Status
    Node Status
    Node Usage
    PersistentVolume Status
    Pod Status
    ReplicatSet Status
    ReplicationController Status
    StatefulSet Status
  • A minimal host template that will only collect metrics for the Kubernetes nodes:

    Cloud-Kubernetes-Node-Api
    Node Usage
    Node Status

Monitored metrics and indicators

Cluster events

This indicator allows to watch the number of events occurring on the cluster, like the kubectl get events can provide:

NAMESPACE   LAST SEEN   TYPE      REASON      OBJECT           MESSAGE
graphite    26m         Warning   Unhealthy   pod/graphite-0   Liveness probe failed: Get "http://10.244.2.10:8080/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The resulting output in Centreon could look like:

Event 'Warning' for object 'Pod/graphite-0' with message 'Liveness probe failed: Get "http://10.244.2.10:8080/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)', Count: 1, First seen: 26m 21s ago (2021-03-11T12:26:23Z), Last seen: 26m 21s ago (2021-03-11T12:26:23Z)

The collected metrics will be:

Metric
events.type.warning.count
events.type.normal.count

It is then possible to place thresholds using the following special variables:

  • %{type}
  • %{object}
  • %{message}
  • %{count}
  • %{first_seen}
  • %{last_seen}
  • %{name}
  • %{namespace}

The defaults values are the following:

ThresholdValueDescription
Warning%{type} =~ /warning/iWill raise a warning alert if there is warning events
Critical%{type} =~ /error/iWill raise a critical alert if there is error events

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

CronJobs status

This indicator allows to check that CronJobs are executed as they should, like the kubectl get cronjobs can provide:

NAME    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello   */1 * * * *   False     1        6s              2d1h

The resulting output in Centreon could look like:

CronJob 'hello' Jobs Active: 1, Last schedule time: 6s ago (2021-03-11T12:31:00Z)

The collected metric for each CronJobs will be:

MetricKubernetes metric
cronjob.jobs.active.countactive

If the service collects metrics of several CronJobs (depending on the chosen scenario), CronJob's name will be appended to the metric name:

Metric
hello#cronjob.jobs.active.count

It is then possible to place thresholds using the following special variables:

  • %{active}
  • %{last_schedule}
  • %{name}
  • %{namespace}

There is no default thresholds. An interesting one could be the following: %{last_schedule} > x where x in seconds is the duration beyond which the CronJob is considered not running as scheduled.

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

DaemonSets status

This indicator will ensure that DaemonSets are within defined bounds by looking at the number of available and/or up-to-date pods compared to the desired count, like the kubectl get daemonsets can provide:

NAMESPACE     NAME                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                   AGE
kube-system   kube-flannel-ds-amd64   3         3         3       3            3           beta.kubernetes.io/arch=amd64   624d
kube-system   kube-proxy              3         3         3       3            3           kubernetes.io/os=linux          624d

The resulting output in Centreon could look like:

Daemonset 'kube-flannel-ds-amd64' Pods Desired: 3, Current: 3, Available: 3, Up-to-date: 3, Ready: 3, Misscheduled: 0
Daemonset 'kube-proxy' Pods Desired: 3, Current: 3, Available: 3, Up-to-date: 3, Ready: 3, Misscheduled: 0

The collected metrics for each Daemonsets will be:

MetricKubernetes metric
daemonset.pods.desired.countdesiredNumberScheduled
daemonset.pods.current.countcurrentNumberScheduled
daemonset.pods.available.countnumberAvailable
daemonset.pods.uptodate.countupdatedNumberScheduled
daemonset.pods.ready.countnumberReady
daemonset.pods.misscheduled.countnumberMisscheduled

If the service collects metrics of several DaemonSets (depending on the chosen scenario), DaemonSet's name will be appended to the metric name:

Metric
kube-proxy#daemonset.pods.desired.count

It is then possible to place thresholds using the following special variables:

  • %{desired}
  • %{current}
  • %{available}
  • %{up_to_date}
  • %{ready}
  • %{misscheduled}
  • %{name}
  • %{namespace}

The defaults values are the following:

ThresholdValueDescription
Warning%{up_to_date} < %{desired}Will raise a warning alert if the number of up-to-date pods is lower than the desired number
Critical%{available} < %{desired}Will raise a critical alert if the number of available pods is lower than the desired number

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

Deployments status

This indicator will ensure that Deployments are within defined bounds by looking at the number of available and/or up-to-date replicas compared to the desired count, like the kubectl get deployments can provide:

NAMESPACE              NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
kube-system            coredns                     2/2     2            2           624d
kube-system            tiller-deploy               1/1     1            1           624d
kubernetes-dashboard   dashboard-metrics-scraper   1/1     1            1           37d
kubernetes-dashboard   kubernetes-dashboard        1/1     1            1           37d

The resulting output in Centreon could look like:

Deployment 'coredns' Replicas Desired: 2, Current: 2, Available: 2, Ready: 2, Up-to-date: 2
Deployment 'tiller-deploy' Replicas Desired: 1, Current: 1, Available: 1, Ready: 1, Up-to-date: 1
Deployment 'dashboard-metrics-scraper' Replicas Desired: 1, Current: 1, Available: 1, Ready: 1, Up-to-date: 1
Deployment 'kubernetes-dashboard' Replicas Desired: 1, Current: 1, Available: 1, Ready: 1, Up-to-date: 1

The collected metrics for each Deployments will be:

MetricKubernetes metric
deployment.replicas.desired.countreplicas (in spec entry)
deployment.replicas.current.countreplicas (in status entry)
deployment.replicas.available.countavailableReplicas
deployment.replicas.ready.countreadyReplicas
deployment.replicas.uptodate.countupdatedReplicas

If the service collects metrics of several Deployments (depending on the chosen scenario), Deployment's name will be appended to the metric name:

Metric
tiller-deploy#deployment.replicas.desired.count

It is then possible to place thresholds using the following special variables:

  • %{desired}
  • %{current}
  • %{available}
  • %{ready}
  • %{up_to_date}
  • %{name}
  • %{namespace}

The defaults values are the following:

ThresholdValueDescription
Warning%{up_to_date} < %{desired}Will raise a warning alert if the number of up-to-date replicas is lower than the desired number
Critical%{available} < %{desired}Will raise a critical alert if the number of available replicas is lower than the desired number

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

Nodes status

This indicator will ensure that Nodes are running well by looking at the conditions statuses, like the kubectl describe nodes can list:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 11 Mar 2021 14:20:25 +0100   Tue, 26 Jan 2021 09:38:11 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 11 Mar 2021 14:20:25 +0100   Wed, 17 Feb 2021 09:37:40 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 11 Mar 2021 14:20:25 +0100   Tue, 26 Jan 2021 09:38:11 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 11 Mar 2021 14:20:25 +0100   Tue, 26 Jan 2021 17:26:36 +0100   KubeletReady                 kubelet is posting ready status

The resulting output in Centreon could look like:

Condition 'DiskPressure' Status is 'False', Reason: 'KubeletHasNoDiskPressure', Message: 'kubelet has no disk pressure'
Condition 'MemoryPressure' Status is 'False', Reason: 'KubeletHasSufficientMemory', Message: 'kubelet has sufficient memory available'
Condition 'PIDPressure' Status is 'False', Reason: 'KubeletHasSufficientPID', Message: 'kubelet has sufficient PID available'
Condition 'Ready' Status is 'True', Reason: 'KubeletReady', Message: 'kubelet is posting ready status'

No metrics are collected.

It is possible to place thresholds using the following special variables:

  • %{type}
  • %{status}
  • %{reason}
  • %{message}

The defaults values are the following:

ThresholdValueDescription
Critical(%{type} =~ /Ready/i && %{status} !~ /True/i) || (%{type} =~ /.*Pressure/i && %{status} !~ /False/i)Will raise a critical alert if the status of the Ready condition is not True or if the status of a Pressure condition is not False

Refer to the official documentation for more information about statuses and how to fine tune your thresholds.

Nodes usage

This indicator will gather metrics about Nodes usage like pods allocation, requests for CPU and memory made by those pods, and limits for CPU and memory allowed to those same pods.

Using Kubernetes command line tool, it could look like the following:

  • Nodes capacity:

    kubectl get nodes -o=custom-columns="NODE:.metadata.name,PODS ALLOCATABLE:.status.allocatable.pods,CPU ALLOCATABLE:.status.allocatable.cpu,MEMORY ALLOCATABLE:.status.allocatable.memory"
    NODE          PODS ALLOCATABLE   CPU ALLOCATABLE   MEMORY ALLOCATABLE
    master-node   110                2                 3778172Ki
    worker-node   110                2                 3778184Ki
    
  • Running pods:

    kubectl get pods -o=custom-columns="NODE:.spec.nodeName,POD:.metadata.name,CPU REQUESTS:.spec.containers[*].resources.requests.cpu,CPU LIMITS:.spec.containers[*].resources.limits.cpu,MEMORY REQUESTS:.spec.containers[*].resources.requests.memory,MEMORY LIMITS:.spec.containers[*].resources.limits.memory"
    NODE          POD                                     CPU REQUESTS   CPU LIMITS   MEMORY REQUESTS   MEMORY LIMITS
    worker-node   coredns-74ff55c5b-g4hmt                 100m           <none>       70Mi              170Mi
    master-node   etcd-master-node                        100m           <none>       100Mi             <none>
    master-node   kube-apiserver-master-node              250m           <none>       <none>            <none>
    master-node   kube-controller-manager-master-node     200m           <none>       <none>            <none>
    master-node   kube-flannel-ds-amd64-fk59g             100m           100m         50Mi              50Mi
    worker-node   kube-flannel-ds-amd64-jwzms             100m           100m         50Mi              50Mi
    master-node   kube-proxy-kkwmb                        <none>         <none>       <none>            <none>
    worker-node   kube-proxy-vprs8                        <none>         <none>       <none>            <none>
    master-node   kube-scheduler-master-node              100m           <none>       <none>            <none>
    master-node   kubernetes-dashboard-7d75c474bb-7zc5j   <none>         <none>       <none>            <none>
    

From the Kubernetes dashboard, the metrics can be found in the Cluser > Nodes menu:

  • Listing from Cluser > Nodes:

    Cluster nodes listing

  • Allocation detail for a node:

    Node allocation detail

The resulting output in Centreon could look like:

Node 'master-node' CPU requests: 37.50% (0.75/2), CPU limits: 5.00% (0.1/2), Memory requests: 3.96% (150.00MB/3.70GB), Memory limits: 1.32% (50.00MB/3.70GB), Pods allocation: 7.27% (8/110)
Node 'worker-node' CPU requests: 35.00% (0.7/2), CPU limits: 115.00% (2.3/2), Memory requests: 31.51% (1.17GB/3.70GB), Memory limits: 115.21% (4.26GB/3.70GB), Pods allocation: 9.09% (10/110)

The collected metrics for each Nodes will be:

Metric
cpu.requests.percentage
cpu.limits.percentage
memory.requests.percentage
memory.limits.percentage
pods.allocation.percentage

If the service collects metrics of several Nodes (depending on the chosen scenario), Node's name will be appended to the metric name:

Metric
worker-node#pods.allocation.percentage

Thresholds expressed in percentage can be put for all metrics, for warning and critical alerts.

PersistentVolumes status

This indicator will ensure that PersistentVolumes are operating correctly by looking at the phase they are in, like the kubectl get pv can provide:

NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                   STORAGECLASS   REASON   AGE
pv-nfs-kubestorage-001   5Gi        RWO            Retain           Available                                                   630d
pv-nfs-kubestorage-002   5Gi        RWO            Retain           Bound       tick/data-influxdb                              630d
pv-nfs-kubestorage-003   5Gi        RWO            Retain           Released    graphite/graphite-pvc                           630d

The resulting output in Centreon could look like:

Persistent Volume 'pv-nfs-kubestorage-001' Phase is 'Available'
Persistent Volume 'pv-nfs-kubestorage-002' Phase is 'Bound'
Persistent Volume 'pv-nfs-kubestorage-003' Phase is 'Released'

No metrics are collected.

It is possible to place thresholds using the following special variables:

  • %{phase}
  • %{name}

The defaults values are the following:

ThresholdValueDescription
Critical`%{phase} !~ /BoundAvailableReleased/i`Will raise a critical alert if the phase is not Bound, Available or Released

Refer to the official documentation for more information about statuses and how to fine tune your thresholds.

Pods status

This indicator will ensure that Pods and their containers are within defined bounds by looking at the number of ready containers compared to the desired count, like the kubectl get pods can provide:

NAMESPACE              NAME                                                     READY   STATUS        RESTARTS   AGE
kube-system            kube-proxy-65zhn                                         1/1     Running       0          37d
kube-system            kube-proxy-kkwmb                                         1/1     Running       0          37d
kube-system            kube-proxy-vprs8                                         1/1     Running       0          37d
kube-system            tiller-deploy-7bf78cdbf7-z5n24                           1/1     Running       5          550d
kubernetes-dashboard   dashboard-metrics-scraper-79c5968bdc-vncxc               1/1     Running       0          37d
kubernetes-dashboard   kubernetes-dashboard-7448ffc97b-42rps                    1/1     Running       0          37d

The resulting output in Centreon could look like:

Checking pod 'kube-proxy-65zhn'
    Containers Ready: 1/1 (100.00%), Status is 'Running', Restarts: 0
    Container 'kube-proxy' Status is 'running', State is 'ready', Restarts: 0
Checking pod 'kube-proxy-kkwmb'
    Containers Ready: 1/1 (100.00%), Status is 'Running', Restarts: 0
    Container 'kube-proxy' Status is 'running', State is 'ready', Restarts: 0
Checking pod 'kube-proxy-vprs8'
    Containers Ready: 1/1 (100.00%), Status is 'Running', Restarts: 0
    Container 'kube-proxy' Status is 'running', State is 'ready', Restarts: 0
Checking pod 'tiller-deploy-7bf78cdbf7-z5n24'
    Containers Ready: 1/1 (100.00%), Status is 'Running', Restarts: 5
    Container 'tiller' Status is 'running', State is 'ready', Restarts: 5
Checking pod 'dashboard-metrics-scraper-79c5968bdc-vncxc'
    Containers Ready: 1/1 (100.00%), Status is 'Running', Restarts: 0
    Container 'dashboard-metrics-scraper' Status is 'running', State is 'ready', Restarts: 0
Checking pod 'kubernetes-dashboard-7448ffc97b-42rps'
    Containers Ready: 1/1 (100.00%), Status is 'Running', Restarts: 0
    Container 'kubernetes-dashboard' Status is 'running', State is 'ready', Restarts: 0

The collected metrics for each Pods will be:

Metric
containers.ready.count
restarts.total.count
containers.restarts.count

If the service collects metrics of several Pods (depending on the chosen scenario), Pod's name and container's name will be appended to the metric name:

Metric
coredns-74ff55c5b-g4hmt#containers.ready.count
coredns-74ff55c5b-g4hmt#restarts.total.count
coredns-74ff55c5b-g4hmt_coredns#containers.restarts.count

It is then possible to place thresholds using the following special variables:

  • %{name}
  • %{status}
  • %{state} (containers only)
  • %{name}
  • %{namespace} (Pods only)

The defaults values are the following:

ThresholdValueDescription
Critical (pod)%{status} !~ /running/iWill raise a critical alert if a pod is not in a running status
Critical (container)%{status} !~ /running/i || %{state} !~ /^ready$/Will raise a critical alert if a container is not in a running status or not in a ready state

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

ReplicaSets status

This indicator will ensure that ReplicaSets are within defined bounds by looking at the number of ready replicas compared to the desired count, like the kubectl get replicasets can provide:

NAMESPACE              NAME                                   DESIRED   CURRENT   READY   AGE
kube-system            coredns-74ff55c5b                      2         2         2       44d
kube-system            tiller-deploy-7bf78cdbf7               1         1         1       630d
kubernetes-dashboard   dashboard-metrics-scraper-79c5968bdc   1         1         1       44d
kubernetes-dashboard   kubernetes-dashboard-7448ffc97b        1         1         1       44d

The resulting output in Centreon could look like:

ReplicaSet 'coredns-74ff55c5b' Replicas Desired: 2, Current: 2, Ready: 2
ReplicaSet 'tiller-deploy-7bf78cdbf7' Replicas Desired: 1, Current: 1, Ready: 1
ReplicaSet 'dashboard-metrics-scraper-79c5968bdc' Replicas Desired: 1, Current: 1, Ready: 1
ReplicaSet 'kubernetes-dashboard-7448ffc97b' Replicas Desired: 1, Current: 1, Ready: 1

The collected metrics for each ReplicaSets will be:

MetricKubernetes metric
replicaset.replicas.desired.countreplicas (in spec entry)
replicaset.replicas.current.countreplicas (in status entry)
replicaset.replicas.ready.countreadyReplicas

If the service collects metrics of several ReplicaSets (depending on the chosen scenario), ReplicaSet's name will be appended to the metric name:

Metric
tiller-deploy-7bf78cdbf7#replicaset.replicas.desired.count

It is then possible to place thresholds using the following special variables:

  • %{desired}
  • %{current}
  • %{ready}
  • %{name}
  • %{namespace}

The defaults values are the following:

ThresholdValueDescription
Critical%{ready} < %{desired}Will raise a critical alert if the number of ready replicas is lower than the desired number

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

ReplicationControllers status

This indicator will ensure that ReplicationControllers are within defined bounds by looking at the number of ready replicas compared to the desired count, like the kubectl get rc can provide:

NAMESPACE   NAME    DESIRED   CURRENT   READY   AGE
elk         nginx   3         3         3       2d19h

The resulting output in Centreon could look like:

ReplicationController 'nginx' Replicas Desired: 3, Current: 3, Ready: 3

The collected metrics for each ReplicaSets will be:

MetricKubernetes metric
replicationcontroller.replicas.desired.countreplicas (in spec entry)
replicationcontroller.replicas.current.countreplicas (in status entry)
replicationcontroller.replicas.ready.countreadyReplicas

If the service collects metrics of several ReplicationControllers (depending on the chosen scenario), ReplicationController's name will be appended to the metric name:

Metric
nginx#replicationcontroller.replicas.desired.count

It is then possible to place thresholds using the following special variables:

  • %{desired}
  • %{current}
  • %{ready}
  • %{name}
  • %{namespace}

The defaults values are the following:

ThresholdValueDescription
Critical%{ready} < %{desired}Will raise a critical alert if the number of ready replicas is lower than the desired number

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

StatefulSets status

This indicator will ensure that StatefulSets are within defined bounds by looking at the number of ready/up-to-date replicas compared to the desired count, like the kubectl get statefulsets can provide:

NAMESPACE    NAME                                        READY   AGE
elk          elasticsearch-master                        2/2     44d
graphite     graphite                                    1/1     3d
prometheus   prometheus-prometheus-operator-prometheus   1/1     619d

The resulting output in Centreon could look like:

StatefulSet 'elasticsearch-master' Replicas Desired: 2, Current: 2, Up-to-date: 2, Ready: 2
StatefulSet 'graphite' Replicas Desired: 1, Current: 1, Up-to-date: 1, Ready: 1
StatefulSet 'prometheus-prometheus-operator-prometheus' Replicas Desired: 1, Current: 1, Up-to-date: 1, Ready: 1

The collected metrics for each StatefulSets will be:

MetricKubernetes metric
statefulset.replicas.desired.countreplicas (in spec entry)
statefulset.replicas.current.countcurrentReplicas
statefulset.replicas.ready.countreadyReplicas
statefulset.replicas.uptodate.countupdatedReplicas

If the service collects metrics of several StatefulSets (depending on the chosen scenario), StatefulSet's name will be appended to the metric name:

Metric
graphite#statefulset.replicas.desired.count

It is then possible to place thresholds using the following special variables:

  • %{desired}
  • %{current}
  • %{ready}
  • %{up_to_date}
  • %{name}
  • %{namespace}

The defaults values are the following:

ThresholdValueDescription
Warning%{up_to_date} < %{desired}Will raise a warning alert if the number of up-to-date replicas is lower than the desired number
Critical%{ready} < %{desired}Will raise a critical alert if the number of ready replicas is lower than the desired number

Refer to the official documentation for more information about collected metrics and how to fine tune your thresholds.

Prerequisites

Centreon Plugin

Install this Plugin on each needed Poller:

yum install centreon-plugin-Cloud-Kubernetes-Api

Kubernetes

As mentioned in the introduction, two ways of communication are available:

  • the RestAPI exposed by the Kubernetes cluster,
  • the CLI tool kubectl to communicate with the cluster's control plane.

For better performances, we recommand to use the RestAPI.

Create a service account

Both flavors can use a service account with sufficient rights to access Kubernetes API.

Create a dedicated service account centreon-service-account in the kube-system namespace to access the API:

kubectl create serviceaccount centreon-service-account --namespace kube-system

Create a cluster role api-access with needed privileges for the Plugin, and bind it to the newly created service account:

cat <<EOF | kubectl create -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: api-access
rules:
  - apiGroups:
      - ""
      - apps
      - batch
    resources:
      - cronjobs
      - daemonsets
      - deployments
      - events
      - namespaces
      - nodes
      - persistentvolumes
      - pods
      - replicasets
      - replicationcontrollers
      - statefulsets
    verbs:
      - get
      - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: api-access
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: api-access
subjects:
- kind: ServiceAccount
  name: centreon-service-account
  namespace: kube-system
EOF

Refer to the official documentation for service account creation or information about secret concept.

Using RestAPI

If you chose to communicate with your Kubernetes platform's RestAPI, the following prerequisites need to be matched:

  • Expose the API with TLS,
  • Retrieve token from service account.
Expose the API

As the API is using HTTPS, you will need a certificate.

You can make an auto-signed key/certificate couple with the following command:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/ssl/private/kubernetesapi.key -out /etc/ssl/certs/kubernetesapi.crt

Then load it as api-certificate into the cluster, from the master node:

kubectl create secret tls api-certificate --key /etc/ssl/private/kubernetesapi.key --cert /etc/ssl/certs/kubernetesapi.crt

The ingress can now be created:

cat <<EOF | kubectl create -f -
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: kubernetesapi-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
spec:
  tls:
    - hosts:
      - kubernetesapi.local.domain
      secretName: api-certificate
  rules:
  - host: kubernetesapi.local.domain
    http:
      paths:
      - backend:
          serviceName: kubernetes
          servicePort: 443
        path: /
EOF

Adapt the host entry to your needs.

Refer to the official documentation for ingresses management.

Retrieve token from service account

Retrieve the secret name from the previously created service account:

kubectl get serviceaccount centreon-service-account --namespace kube-system --output jsonpath='{.secrets[].name}'

Then retrieve the token from the service account secret:

kubectl get secrets centreon-service-account-token-xqw7m --namespace kube-system --output jsonpath='{.data.token}' | base64 --decode

This token will be used later for Centreon host configuration.

Using kubectl

If you chose to communicate with the cluster's control plane with kubectl, the following prerequisites need to be matched:

  • Install the kubectl tool,
  • Create a kubectl configuration.

Those actions are needed on all Pollers that will do Kubernetes monitoring.

Install kubectl

Download the latest release with the following command:

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"

Be sure to download a version within one minor version difference of your cluster. To download a specific version, change the embedded curl by the version like v1.20.0.

Install the tool in the binaries directory:

sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Refer to the official documentation for more details.

Create a kubectl configuration

To access the cluster, kubectl needs a configuration file with all needed information.

Here is an example of a configuration file creation based on a service account (created in previous chapter).

You will need to fill the following information and execute the commands on the master node:

ip=<master node ip>
port=<api port>
account=centreon-service-account
namespace=kube-system
clustername=my-kube-cluster
context=my-kube-cluster
secret=$(kubectl get serviceaccount $account --namespace $namespace --output jsonpath='{.secrets[].name}')
ca=$(kubectl get secret $secret --namespace $namespace --output jsonpath='{.data.ca\.crt}')
token=$(kubectl get secret $secret --namespace $namespace --output jsonpath='{.data.token}' | base64 --decode)

The account name and namespace must match with the account created earlier. All others need to be adapted.

Then execute this command to generate the config file :

cat <<EOF >> config
apiVersion: v1
kind: Config
clusters:
- name: ${clustername}
  cluster:
    certificate-authority-data: ${ca}
    server: https://${ip}:${port}
contexts:
- name: ${context}
  context:
    cluster: ${clustername}
    namespace: ${namespace}
    user: ${account}
current-context: ${context}
users:
- name: ${account}
  user: ${token}
EOF

This will create a config file. This file must be copied to the Pollers Engine user's home, usually in a .kube directory (i.e. /var/lib/centreon-engine/.kube/config).

This path will be used later in Centreon host configuration.

You may also want to copy the configuration to Gorgone user's home if using Host Discovery.

Refer to the official documentation for more details.

Monitoring configuration

Manual creation

Add a host from Configuration > Hosts menu and choose a template between Cloud-Kubernetes-Api (global monitoring, scenarii 1 and 2) and Cloud-Kubernetes-Node-Api (unitary monitoring, scenario 3) from the list.

In both cases, fill the following fields:

FieldDescription
Host nameName of the host
AliasHost description
IPHost IP Address
Monitored fromMonitoring Poller to use

IP address can either be the Kubernetes IP (node master) or the IP of each nodes if choosing Cloud-Kubernetes-Node-Api template.

Then set the values for each needed macros:

  • If using RestAPI:

    MacroDescriptionExample value
    KUBERNETESAPICUSTOMMODEPlugin custom modeapi
    KUBERNETESAPIHOSTNAMEHostname or address of the Kubernetes API servicekubenetesapi.local.domain
    KUBERNETESAPIPORTPort of the API443
    KUBERNETESAPIPROTOProtocol used by APIhttps
    KUBERNETESAPITOKENToken retrieved from service accounteyJhbG...KEw
  • If using kubectl:

    MacroDescriptionExample value
    KUBERNETESAPICUSTOMMODEPlugin custom modekubectl
    KUBECTLCONFIGFILEPath to the configuration file~/.kube/config

Optional macros values can be set:

MacroDescriptionDefault value
PROXYURLURL of the proxy (if needed)none
TIMEOUTTime in seconds before the query timed out10
EXTRAOPTIONSExtra options (if needed)none

If choosing Cloud-Kubernetes-Api, host will be added with all services to check each Kubernetes unit (scenario 1).

If choosing Cloud-Kubernetes-Node-Api, host will be added with only one service to check node usage (scenario 3).

Click on the Save button and you're good to push the configuration to the Engines.

Automatic discovery

Host discovery

Add a job from Configuration > Discovery menu and choose a provider between Kubernetes Nodes (RestAPI) and Kubernetes Nodes (Kubectl) from the list.

Set credentials to access the Kubernetes API depending on the chosen flavor:

  • If using RestAPI: set the token retrieved ealier from the service account,
  • If using kubectl: set the path to the created configuration file (prefer using relative path to make it work for both discovery and monitoring, i.e. ~/.kube/config).

For RestAPI: hostname/address, port and protocol are needed to access the Kubernetes API.

By default, discovery will add hosts with a minimal host template that will only collect metrics for the Kubernetes nodes usage. It will then add a special KUBERNETESNODENAME macro with node name as value (scenario 3).

If scheduled, job will add new nodes added to the cluster automatically.

Service discovery

In addition to manual creation, it is possible to add a service for each instances of each Kubernetes units (scenario 2). It is then recommended to disable the previously created services when adding host.

Launch a scan on the added host from Configuration > Service > Scan menu, and add all wanted services.

If enabled, service discovery rule will add new instances created in the cluster automatically.

Troubleshooting

Here are some common errors and their description. You will often want to use the --debug option to get the root error.

ErrorDescription
UNKNOWN: Cannot decode json response: Can't connect to <hostname>:<port> (certificate verify failed)This error may appear if the TLS cetificate in self-signed. Use the option --ssl-opt="SSL_verify_mode => SSL_VERIFY_NONE" to omit the certificate validity.
UNKNOWN: API return error code '401' (add --debug option for detailed message)If adding --debug option, API response message says Unauthorized. It generally means that the provided token is not valid.
UNKNOWN: API return error code '403' (add --debug option for detailed message)If adding --debug option, API response message says nodes is forbidden: User "system:serviceaccount:<namespace>:<account>" cannot list resource "nodes" in API group "" at the cluster scope. It means that the cluster role RBAC bound to the service account does not have the necessary privileges
UNKNOWN: CLI return error code '1' (add --debug option for detailed message)If adding --debug option, CLI response message says error: stat ~/.kube/config:: no such file or directory. The provided configuration file cannot be found.
UNKNOWN: CLI return error code '1' (add --debug option for detailed message)If adding --debug option, CLI response message says error: error loading config file "/root/.kube/config": open /root/.kube/config: permission denied. The provided configuration file cannot be read by current user.
UNKNOWN: CLI return error code '1' (add --debug option for detailed message)If adding --debug option, CLI response message says error: error loading config file "/root/.kube/config": v1.Config.AuthInfos: []v1.NamedAuthInfo: v1.NamedAuthInfo.AuthInfo: v1.AuthInfo.ClientKeyData: decode base64: illegal base64.... The provided configuration file is not valid.
UNKNOWN: CLI return error code '1' (add --debug option for detailed message)If adding --debug option, CLI response message says The connection to the server <hostname>:<port> was refused - did you specify the right host or port?. The provided configuration file is not valid.
← IBM SoftlayerKubernetes w/ Prometheus →
  • Overview
  • Pack assets
    • Discovery
    • Templates
  • Monitored metrics and indicators
    • Cluster events
    • CronJobs status
    • DaemonSets status
    • Deployments status
    • Nodes status
    • Nodes usage
    • PersistentVolumes status
    • Pods status
    • ReplicaSets status
    • ReplicationControllers status
    • StatefulSets status
  • Prerequisites
    • Centreon Plugin
    • Kubernetes
  • Monitoring configuration
    • Manual creation
    • Automatic discovery
  • Troubleshooting
Centreon documentation
Documentation
Getting StartedAPI ReferencesReleases
Resources
Centreon WebsiteBlogDownload
Follow us
centreon
Follow @Centreon
Copyright © 2005 - 2021 Centreon