Source:
Ceph by Áú»¢¶Ä²© agent 2
Overview
The template to monitor Ceph cluster by Áú»¢¶Ä²© that work without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.
Template Ceph by Áú»¢¶Ä²© agent 2
¡ª collects metrics by polling zabbix-agent2.
Requirements
Áú»¢¶Ä²© version: 7.2 and higher.
Tested versions
This template has been tested on:
- Ceph 14.2
Configuration
Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.
Setup
- Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
- Set the {$CEPH.CONNSTRING} such as <protocol(host:port)> or named session.
- Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Áú»¢¶Ä²© agent configuration file.
Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Macros used
Name | Description | Default |
---|---|---|
{$CEPH.USER} | zabbix |
|
{$CEPH.API.KEY} | zabbix_pass |
|
{$CEPH.CONNSTRING} | https://localhost:8003 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get overall cluster status | Áú»¢¶Ä²© agent | ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Get OSD stats | Áú»¢¶Ä²© agent | ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Get OSD dump | Áú»¢¶Ä²© agent | ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Get df | Áú»¢¶Ä²© agent | ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Ping | Áú»¢¶Ä²© agent | ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] Preprocessing
|
|
Number of Monitors | The number of Monitors configured in a Ceph cluster. |
Dependent item | ceph.num_mon Preprocessing
|
Overall cluster status | The overall Ceph cluster status, eg 0 - HEALTH_OK, 1 - HEALTH_WARN or 2 - HEALTH_ERR. |
Dependent item | ceph.overall_status Preprocessing
|
Minimum Mon release version | min_mon_release_name |
Dependent item | ceph.min_mon_release_name Preprocessing
|
Ceph Read bandwidth | The global read bytes per second. |
Dependent item | ceph.rd_bytes.rate Preprocessing
|
Ceph Write bandwidth | The global write bytes per second. |
Dependent item | ceph.wr_bytes.rate Preprocessing
|
Ceph Read operations per sec | The global read operations per second. |
Dependent item | ceph.rd_ops.rate Preprocessing
|
Ceph Write operations per sec | The global write operations per second. |
Dependent item | ceph.wr_ops.rate Preprocessing
|
Total bytes available | The total bytes available in a Ceph cluster. |
Dependent item | ceph.total_avail_bytes Preprocessing
|
Total bytes | The total (RAW) capacity of a Ceph cluster in bytes. |
Dependent item | ceph.total_bytes Preprocessing
|
Total bytes used | The total bytes used in a Ceph cluster. |
Dependent item | ceph.total_used_bytes Preprocessing
|
Total number of objects | The total number of objects in a Ceph cluster. |
Dependent item | ceph.total_objects Preprocessing
|
Number of Placement Groups | The total number of Placement Groups in a Ceph cluster. |
Dependent item | ceph.num_pg Preprocessing
|
Number of Placement Groups in Temporary state | The total number of Placement Groups in a pg_temp state |
Dependent item | ceph.num_pg_temp Preprocessing
|
Number of Placement Groups in Active state | The total number of Placement Groups in an active state. |
Dependent item | ceph.pg_states.active Preprocessing
|
Number of Placement Groups in Clean state | The total number of Placement Groups in a clean state. |
Dependent item | ceph.pg_states.clean Preprocessing
|
Number of Placement Groups in Peering state | The total number of Placement Groups in a peering state. |
Dependent item | ceph.pg_states.peering Preprocessing
|
Number of Placement Groups in Scrubbing state | The total number of Placement Groups in a scrubbing state. |
Dependent item | ceph.pg_states.scrubbing Preprocessing
|
Number of Placement Groups in Undersized state | The total number of Placement Groups in an undersized state. |
Dependent item | ceph.pg_states.undersized Preprocessing
|
Number of Placement Groups in Backfilling state | The total number of Placement Groups in a backfill state. |
Dependent item | ceph.pg_states.backfilling Preprocessing
|
Number of Placement Groups in degraded state | The total number of Placement Groups in a degraded state. |
Dependent item | ceph.pg_states.degraded Preprocessing
|
Number of Placement Groups in inconsistent state | The total number of Placement Groups in an inconsistent state. |
Dependent item | ceph.pg_states.inconsistent Preprocessing
|
Number of Placement Groups in Unknown state | The total number of Placement Groups in an unknown state. |
Dependent item | ceph.pg_states.unknown Preprocessing
|
Number of Placement Groups in remapped state | The total number of Placement Groups in a remapped state. |
Dependent item | ceph.pg_states.remapped Preprocessing
|
Number of Placement Groups in recovering state | The total number of Placement Groups in a recovering state. |
Dependent item | ceph.pg_states.recovering Preprocessing
|
Number of Placement Groups in backfill_toofull state | The total number of Placement Groups in a backfill_toofull state. |
Dependent item | ceph.pg_states.backfill_toofull Preprocessing
|
Number of Placement Groups in backfill_wait state | The total number of Placement Groups in a backfill_wait state. |
Dependent item | ceph.pg_states.backfill_wait Preprocessing
|
Number of Placement Groups in recovery_wait state | The total number of Placement Groups in a recovery_wait state. |
Dependent item | ceph.pg_states.recovery_wait Preprocessing
|
Number of Pools | The total number of pools in a Ceph cluster. |
Dependent item | ceph.num_pools Preprocessing
|
Number of OSDs | The number of the known storage daemons in a Ceph cluster. |
Dependent item | ceph.num_osd Preprocessing
|
Number of OSDs in state: UP | The total number of the online storage daemons in a Ceph cluster. |
Dependent item | ceph.num_osd_up Preprocessing
|
Number of OSDs in state: IN | The total number of the participating storage daemons in a Ceph cluster. |
Dependent item | ceph.num_osd_in Preprocessing
|
Ceph OSD avg fill | The average fill of OSDs. |
Dependent item | ceph.osd_fill.avg Preprocessing
|
Ceph OSD max fill | The percentage of the most filled OSD. |
Dependent item | ceph.osd_fill.max Preprocessing
|
Ceph OSD min fill | The percentage fill of the minimum filled OSD. |
Dependent item | ceph.osd_fill.min Preprocessing
|
Ceph OSD max PGs | The maximum amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.max Preprocessing
|
Ceph OSD min PGs | The minimum amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.min Preprocessing
|
Ceph OSD avg PGs | The average amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.avg Preprocessing
|
Ceph OSD Apply latency Avg | The average apply latency of OSDs. |
Dependent item | ceph.osd_latency_apply.avg Preprocessing
|
Ceph OSD Apply latency Max | The maximum apply latency of OSDs. |
Dependent item | ceph.osd_latency_apply.max Preprocessing
|
Ceph OSD Apply latency Min | The minimum apply latency of OSDs. |
Dependent item | ceph.osd_latency_apply.min Preprocessing
|
Ceph OSD Commit latency Avg | The average commit latency of OSDs. |
Dependent item | ceph.osd_latency_commit.avg Preprocessing
|
Ceph OSD Commit latency Max | The maximum commit latency of OSDs. |
Dependent item | ceph.osd_latency_commit.max Preprocessing
|
Ceph OSD Commit latency Min | The minimum commit latency of OSDs. |
Dependent item | ceph.osd_latency_commit.min Preprocessing
|
Ceph backfill full ratio | The backfill full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osd_backfillfull_ratio Preprocessing
|
Ceph full ratio | The full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osd_full_ratio Preprocessing
|
Ceph nearfull ratio | The near full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osd_nearfull_ratio Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: Can not connect to cluster | The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues). |
last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 |
Average | |
Ceph: Cluster in ERROR state | last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.overall_status)=2 |
Average | Manual close: Yes | |
Ceph: Cluster in WARNING state | last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.overall_status)=1 |
Warning | Manual close: Yes Depends on:
|
|
Ceph: Minimum monitor release version has changed | A Ceph version has changed. Acknowledge to close the problem manually. |
last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.min_mon_release_name))>0 |
Info | Manual close: Yes |
LLD rule OSD
Name | Description | Type | Key and additional info |
---|---|---|---|
OSD | Áú»¢¶Ä²© agent | ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Item prototypes for OSD
Name | Description | Type | Key and additional info |
---|---|---|---|
[osd.{#OSDNAME}] OSD in | Dependent item | ceph.osd[{#OSDNAME},in] Preprocessing
|
|
[osd.{#OSDNAME}] OSD up | Dependent item | ceph.osd[{#OSDNAME},up] Preprocessing
|
|
[osd.{#OSDNAME}] OSD PGs | Dependent item | ceph.osd[{#OSDNAME},num_pgs] Preprocessing
|
|
[osd.{#OSDNAME}] OSD fill | Dependent item | ceph.osd[{#OSDNAME},fill] Preprocessing
|
|
[osd.{#OSDNAME}] OSD latency apply | The time taken to flush an update to disks. |
Dependent item | ceph.osd[{#OSDNAME},latency_apply] Preprocessing
|
[osd.{#OSDNAME}] OSD latency commit | The time taken to commit an operation to the journal. |
Dependent item | ceph.osd[{#OSDNAME},latency_commit] Preprocessing
|
Trigger prototypes for OSD
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: OSD osd.{#OSDNAME} is down | OSD osd.{#OSDNAME} is marked "down" in the osdmap. |
last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd[{#OSDNAME},up]) = 0 |
Average | |
Ceph: OSD osd.{#OSDNAME} is full | min(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd_full_ratio)*100 |
Average | ||
Ceph: Ceph OSD osd.{#OSDNAME} is near full | min(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd_nearfull_ratio)*100 |
Warning | Depends on:
|
LLD rule Pool
Name | Description | Type | Key and additional info |
---|---|---|---|
Pool | Áú»¢¶Ä²© agent | ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Item prototypes for Pool
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#POOLNAME}] Pool Used | The total bytes used in a pool. |
Dependent item | ceph.pool["{#POOLNAME}",bytes_used] Preprocessing
|
[{#POOLNAME}] Max available | The maximum available space in the given pool. |
Dependent item | ceph.pool["{#POOLNAME}",max_avail] Preprocessing
|
[{#POOLNAME}] Pool RAW Used | Bytes used in pool including the copies made. |
Dependent item | ceph.pool["{#POOLNAME}",stored_raw] Preprocessing
|
[{#POOLNAME}] Pool Percent Used | The percentage of the storage used per pool. |
Dependent item | ceph.pool["{#POOLNAME}",percent_used] Preprocessing
|
[{#POOLNAME}] Pool objects | The number of objects in the pool. |
Dependent item | ceph.pool["{#POOLNAME}",objects] Preprocessing
|
[{#POOLNAME}] Pool Read bandwidth | The read rate per pool (bytes per second). |
Dependent item | ceph.pool["{#POOLNAME}",rd_bytes.rate] Preprocessing
|
[{#POOLNAME}] Pool Write bandwidth | The write rate per pool (bytes per second). |
Dependent item | ceph.pool["{#POOLNAME}",wr_bytes.rate] Preprocessing
|
[{#POOLNAME}] Pool Read operations | The read rate per pool (operations per second). |
Dependent item | ceph.pool["{#POOLNAME}",rd_ops.rate] Preprocessing
|
[{#POOLNAME}] Pool Write operations | The write rate per pool (operations per second). |
Dependent item | ceph.pool["{#POOLNAME}",wr_ops.rate] Preprocessing
|
Feedback
Please report any issues with the template at
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums