Áú»¢¶Ä²©

Ceph

Ceph is a free-software storage platform, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

Dostupn¨¢ ?±ð?±ð²Ô¨ª




This template is for Áú»¢¶Ä²© version: 7.2

Source:

Ceph by Áú»¢¶Ä²© agent 2

Overview

The template to monitor Ceph cluster by Áú»¢¶Ä²© that work without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.

Template Ceph by Áú»¢¶Ä²© agent 2 ¡ª collects metrics by polling zabbix-agent2.

Requirements

Áú»¢¶Ä²© version: 7.2 and higher.

Tested versions

This template has been tested on:

  • Ceph 14.2

Configuration

Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.

Setup

  1. Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
  2. Set the {$CEPH.CONNSTRING} such as <protocol(host:port)> or named session.
  3. Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Áú»¢¶Ä²© agent configuration file.

Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Macros used

Name Description Default
{$CEPH.USER} zabbix
{$CEPH.API.KEY} zabbix_pass
{$CEPH.CONNSTRING} https://localhost:8003

Items

Name Description Type Key and additional info
Get overall cluster status Áú»¢¶Ä²© agent ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get OSD stats Áú»¢¶Ä²© agent ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get OSD dump Áú»¢¶Ä²© agent ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get df Áú»¢¶Ä²© agent ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ping Áú»¢¶Ä²© agent ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Preprocessing

  • Discard unchanged with heartbeat: 30m

Number of Monitors

The number of Monitors configured in a Ceph cluster.

Dependent item ceph.num_mon

Preprocessing

  • JSON Path: $.num_mon

  • Discard unchanged with heartbeat: 30m

Overall cluster status

The overall Ceph cluster status, eg 0 - HEALTH_OK, 1 - HEALTH_WARN or 2 - HEALTH_ERR.

Dependent item ceph.overall_status

Preprocessing

  • JSON Path: $.overall_status

  • Discard unchanged with heartbeat: 10m

Minimum Mon release version

min_mon_release_name

Dependent item ceph.min_mon_release_name

Preprocessing

  • JSON Path: $.min_mon_release_name

  • Discard unchanged with heartbeat: 1h

Ceph Read bandwidth

The global read bytes per second.

Dependent item ceph.rd_bytes.rate

Preprocessing

  • JSON Path: $.rd_bytes

  • Change per second
Ceph Write bandwidth

The global write bytes per second.

Dependent item ceph.wr_bytes.rate

Preprocessing

  • JSON Path: $.wr_bytes

  • Change per second
Ceph Read operations per sec

The global read operations per second.

Dependent item ceph.rd_ops.rate

Preprocessing

  • JSON Path: $.rd_ops

  • Change per second
Ceph Write operations per sec

The global write operations per second.

Dependent item ceph.wr_ops.rate

Preprocessing

  • JSON Path: $.wr_ops

  • Change per second
Total bytes available

The total bytes available in a Ceph cluster.

Dependent item ceph.total_avail_bytes

Preprocessing

  • JSON Path: $.total_avail_bytes

Total bytes

The total (RAW) capacity of a Ceph cluster in bytes.

Dependent item ceph.total_bytes

Preprocessing

  • JSON Path: $.total_bytes

Total bytes used

The total bytes used in a Ceph cluster.

Dependent item ceph.total_used_bytes

Preprocessing

  • JSON Path: $.total_used_bytes

Total number of objects

The total number of objects in a Ceph cluster.

Dependent item ceph.total_objects

Preprocessing

  • JSON Path: $.total_objects

Number of Placement Groups

The total number of Placement Groups in a Ceph cluster.

Dependent item ceph.num_pg

Preprocessing

  • JSON Path: $.num_pg

  • Discard unchanged with heartbeat: 10m

Number of Placement Groups in Temporary state

The total number of Placement Groups in a pg_temp state

Dependent item ceph.num_pg_temp

Preprocessing

  • JSON Path: $.num_pg_temp

Number of Placement Groups in Active state

The total number of Placement Groups in an active state.

Dependent item ceph.pg_states.active

Preprocessing

  • JSON Path: $.pg_states.active

Number of Placement Groups in Clean state

The total number of Placement Groups in a clean state.

Dependent item ceph.pg_states.clean

Preprocessing

  • JSON Path: $.pg_states.clean

Number of Placement Groups in Peering state

The total number of Placement Groups in a peering state.

Dependent item ceph.pg_states.peering

Preprocessing

  • JSON Path: $.pg_states.peering

Number of Placement Groups in Scrubbing state

The total number of Placement Groups in a scrubbing state.

Dependent item ceph.pg_states.scrubbing

Preprocessing

  • JSON Path: $.pg_states.scrubbing

Number of Placement Groups in Undersized state

The total number of Placement Groups in an undersized state.

Dependent item ceph.pg_states.undersized

Preprocessing

  • JSON Path: $.pg_states.undersized

Number of Placement Groups in Backfilling state

The total number of Placement Groups in a backfill state.

Dependent item ceph.pg_states.backfilling

Preprocessing

  • JSON Path: $.pg_states.backfilling

Number of Placement Groups in degraded state

The total number of Placement Groups in a degraded state.

Dependent item ceph.pg_states.degraded

Preprocessing

  • JSON Path: $.pg_states.degraded

Number of Placement Groups in inconsistent state

The total number of Placement Groups in an inconsistent state.

Dependent item ceph.pg_states.inconsistent

Preprocessing

  • JSON Path: $.pg_states.inconsistent

Number of Placement Groups in Unknown state

The total number of Placement Groups in an unknown state.

Dependent item ceph.pg_states.unknown

Preprocessing

  • JSON Path: $.pg_states.unknown

Number of Placement Groups in remapped state

The total number of Placement Groups in a remapped state.

Dependent item ceph.pg_states.remapped

Preprocessing

  • JSON Path: $.pg_states.remapped

Number of Placement Groups in recovering state

The total number of Placement Groups in a recovering state.

Dependent item ceph.pg_states.recovering

Preprocessing

  • JSON Path: $.pg_states.recovering

Number of Placement Groups in backfill_toofull state

The total number of Placement Groups in a backfill_toofull state.

Dependent item ceph.pg_states.backfill_toofull

Preprocessing

  • JSON Path: $.pg_states.backfill_toofull

Number of Placement Groups in backfill_wait state

The total number of Placement Groups in a backfill_wait state.

Dependent item ceph.pg_states.backfill_wait

Preprocessing

  • JSON Path: $.pg_states.backfill_wait

Number of Placement Groups in recovery_wait state

The total number of Placement Groups in a recovery_wait state.

Dependent item ceph.pg_states.recovery_wait

Preprocessing

  • JSON Path: $.pg_states.recovery_wait

Number of Pools

The total number of pools in a Ceph cluster.

Dependent item ceph.num_pools

Preprocessing

  • JSON Path: $.num_pools

Number of OSDs

The number of the known storage daemons in a Ceph cluster.

Dependent item ceph.num_osd

Preprocessing

  • JSON Path: $.num_osd

  • Discard unchanged with heartbeat: 10m

Number of OSDs in state: UP

The total number of the online storage daemons in a Ceph cluster.

Dependent item ceph.num_osd_up

Preprocessing

  • JSON Path: $.num_osd_up

  • Discard unchanged with heartbeat: 10m

Number of OSDs in state: IN

The total number of the participating storage daemons in a Ceph cluster.

Dependent item ceph.num_osd_in

Preprocessing

  • JSON Path: $.num_osd_in

  • Discard unchanged with heartbeat: 10m

Ceph OSD avg fill

The average fill of OSDs.

Dependent item ceph.osd_fill.avg

Preprocessing

  • JSON Path: $.osd_fill.avg

Ceph OSD max fill

The percentage of the most filled OSD.

Dependent item ceph.osd_fill.max

Preprocessing

  • JSON Path: $.osd_fill.max

Ceph OSD min fill

The percentage fill of the minimum filled OSD.

Dependent item ceph.osd_fill.min

Preprocessing

  • JSON Path: $.osd_fill.min

Ceph OSD max PGs

The maximum amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.max

Preprocessing

  • JSON Path: $.osd_pgs.max

Ceph OSD min PGs

The minimum amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.min

Preprocessing

  • JSON Path: $.osd_pgs.min

Ceph OSD avg PGs

The average amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.avg

Preprocessing

  • JSON Path: $.osd_pgs.avg

Ceph OSD Apply latency Avg

The average apply latency of OSDs.

Dependent item ceph.osd_latency_apply.avg

Preprocessing

  • JSON Path: $.osd_latency_apply.avg

Ceph OSD Apply latency Max

The maximum apply latency of OSDs.

Dependent item ceph.osd_latency_apply.max

Preprocessing

  • JSON Path: $.osd_latency_apply.max

Ceph OSD Apply latency Min

The minimum apply latency of OSDs.

Dependent item ceph.osd_latency_apply.min

Preprocessing

  • JSON Path: $.osd_latency_apply.min

Ceph OSD Commit latency Avg

The average commit latency of OSDs.

Dependent item ceph.osd_latency_commit.avg

Preprocessing

  • JSON Path: $.osd_latency_commit.avg

Ceph OSD Commit latency Max

The maximum commit latency of OSDs.

Dependent item ceph.osd_latency_commit.max

Preprocessing

  • JSON Path: $.osd_latency_commit.max

Ceph OSD Commit latency Min

The minimum commit latency of OSDs.

Dependent item ceph.osd_latency_commit.min

Preprocessing

  • JSON Path: $.osd_latency_commit.min

Ceph backfill full ratio

The backfill full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_backfillfull_ratio

Preprocessing

  • JSON Path: $.osd_backfillfull_ratio

  • Discard unchanged with heartbeat: 10m

Ceph full ratio

The full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_full_ratio

Preprocessing

  • JSON Path: $.osd_full_ratio

  • Discard unchanged with heartbeat: 10m

Ceph nearfull ratio

The near full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_nearfull_ratio

Preprocessing

  • JSON Path: $.osd_nearfull_ratio

  • Discard unchanged with heartbeat: 10m

Triggers

Name Description Expression Severity Dependencies and additional info
Ceph: Can not connect to cluster

The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues).

last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 Average
Ceph: Cluster in ERROR state last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.overall_status)=2 Average Manual close: Yes
Ceph: Cluster in WARNING state last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.overall_status)=1 Warning Manual close: Yes
Depends on:
  • Ceph: Cluster in ERROR state
Ceph: Minimum monitor release version has changed

A Ceph version has changed. Acknowledge to close the problem manually.

last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.min_mon_release_name))>0 Info Manual close: Yes

LLD rule OSD

Name Description Type Key and additional info
OSD Áú»¢¶Ä²© agent ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for OSD

Name Description Type Key and additional info
[osd.{#OSDNAME}] OSD in Dependent item ceph.osd[{#OSDNAME},in]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.in

  • Discard unchanged with heartbeat: 10m

[osd.{#OSDNAME}] OSD up Dependent item ceph.osd[{#OSDNAME},up]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.up

  • Discard unchanged with heartbeat: 10m

[osd.{#OSDNAME}] OSD PGs Dependent item ceph.osd[{#OSDNAME},num_pgs]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.num_pgs

    ??Custom on fail: Discard value

[osd.{#OSDNAME}] OSD fill Dependent item ceph.osd[{#OSDNAME},fill]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_fill

    ??Custom on fail: Discard value

[osd.{#OSDNAME}] OSD latency apply

The time taken to flush an update to disks.

Dependent item ceph.osd[{#OSDNAME},latency_apply]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_latency_apply

    ??Custom on fail: Discard value

[osd.{#OSDNAME}] OSD latency commit

The time taken to commit an operation to the journal.

Dependent item ceph.osd[{#OSDNAME},latency_commit]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_latency_commit

    ??Custom on fail: Discard value

Trigger prototypes for OSD

Name Description Expression Severity Dependencies and additional info
Ceph: OSD osd.{#OSDNAME} is down

OSD osd.{#OSDNAME} is marked "down" in the osdmap.
The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network.

last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd[{#OSDNAME},up]) = 0 Average
Ceph: OSD osd.{#OSDNAME} is full min(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd_full_ratio)*100 Average
Ceph: Ceph OSD osd.{#OSDNAME} is near full min(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Áú»¢¶Ä²© agent 2/ceph.osd_nearfull_ratio)*100 Warning Depends on:
  • Ceph: OSD osd.{#OSDNAME} is full

LLD rule Pool

Name Description Type Key and additional info
Pool Áú»¢¶Ä²© agent ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for Pool

Name Description Type Key and additional info
[{#POOLNAME}] Pool Used

The total bytes used in a pool.

Dependent item ceph.pool["{#POOLNAME}",bytes_used]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].bytes_used

[{#POOLNAME}] Max available

The maximum available space in the given pool.

Dependent item ceph.pool["{#POOLNAME}",max_avail]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].max_avail

[{#POOLNAME}] Pool RAW Used

Bytes used in pool including the copies made.

Dependent item ceph.pool["{#POOLNAME}",stored_raw]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].stored_raw

[{#POOLNAME}] Pool Percent Used

The percentage of the storage used per pool.

Dependent item ceph.pool["{#POOLNAME}",percent_used]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].percent_used

[{#POOLNAME}] Pool objects

The number of objects in the pool.

Dependent item ceph.pool["{#POOLNAME}",objects]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].objects

[{#POOLNAME}] Pool Read bandwidth

The read rate per pool (bytes per second).

Dependent item ceph.pool["{#POOLNAME}",rd_bytes.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].rd_bytes

  • Change per second
[{#POOLNAME}] Pool Write bandwidth

The write rate per pool (bytes per second).

Dependent item ceph.pool["{#POOLNAME}",wr_bytes.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].wr_bytes

  • Change per second
[{#POOLNAME}] Pool Read operations

The read rate per pool (operations per second).

Dependent item ceph.pool["{#POOLNAME}",rd_ops.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].rd_ops

  • Change per second
[{#POOLNAME}] Pool Write operations

The write rate per pool (operations per second).

Dependent item ceph.pool["{#POOLNAME}",wr_ops.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].wr_ops

  • Change per second

Feedback

Please report any issues with the template at

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

?l¨¢nky a dokumentace

  • blog.zabbix.com:
  • ceph.com:
  • reddit.com:
+ Navrhn¨§te nov? ?l¨¢nek

Nena?li jste integraci, kterou pot?ebujete?