Áú»¢¶Ä²©

CockroachDB

CockroachDB is a cloud-native distributed SQL database designed to build, scale, and manage modern, data-intensive applications.

Available solutions




This template is for Áú»¢¶Ä²© version: 7.2
Also available for: 7.0 6.4 6.2 6.0

Source:

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Áú»¢¶Ä²© that works without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.

The template collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. The template doesn't require usage of session token.

Note, that some metrics may not be collected depending on your CockroachDB version and configuration.

Requirements

Áú»¢¶Ä²© version: 7.2 and higher.

Tested versions

This template has been tested on:

  • CockroachDB 21.2.8

Configuration

Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the CockroachDB node host in the {$COCKROACHDB.API.HOST} macro. You can also change the port in the {$COCKROACHDB.API.PORT} macro and the scheme in the {$COCKROACHDB.API.SCHEME} macro if necessary.

Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name Description Default
{$COCKROACHDB.API.HOST}

The hostname or IP address of the CockroachDB host.

<SET COCKROACHDB HOST>
{$COCKROACHDB.API.PORT}

The port of CockroachDB API and Prometheus endpoint.

8080
{$COCKROACHDB.API.SCHEME}

Request scheme which may be http or https.

http
{$COCKROACHDB.STORE.USED.MIN.WARN}

The warning threshold of the available disk space in percent.

20
{$COCKROACHDB.STORE.USED.MIN.CRIT}

The critical threshold of the available disk space in percent.

10
{$COCKROACHDB.OPEN.FDS.MAX.WARN}

Maximum percentage of used file descriptors.

80
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}

Number of days until the node certificate expires.

30
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}

Number of days until the CA certificate expires.

90
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}

Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.

300
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}

Maximum number of SQL statements errors for trigger expression.

2

Items

Name Description Type Key and additional info
Get metrics

Get raw metrics from the Prometheus endpoint.

HTTP agent cockroachdb.get_metrics

Preprocessing

  • Check for not supported value: any error

    ??Custom on fail: Discard value

Get health

Get node /health endpoint

HTTP agent cockroachdb.get_health

Preprocessing

  • Check for not supported value: any error

    ??Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Get readiness

Get node /health?ready=1 endpoint

HTTP agent cockroachdb.get_readiness

Preprocessing

  • Check for not supported value: any error

    ??Custom on fail: Discard value

  • Regular expression: HTTP.*\s(\d+) \1

  • Discard unchanged with heartbeat: 3h

Service ping

Check if HTTP/HTTPS service accepts TCP connections.

Simple check net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]

Preprocessing

  • Discard unchanged with heartbeat: 10m

Clock offset

Mean clock offset of the node against the rest of the cluster.

Dependent item cockroachdb.clock.offset

Preprocessing

  • Prometheus pattern: VALUE(clock_offset_meannanos)

  • Custom multiplier: 0.000000001

Version

Build information.

Dependent item cockroachdb.version

Preprocessing

  • Prometheus pattern: build_timestamp label tag

  • Discard unchanged with heartbeat: 3h

CPU: System time

System CPU time.

Dependent item cockroachdb.cpu.system_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_sys_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: User time

User CPU time.

Dependent item cockroachdb.cpu.user_time

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_user_ns)

  • Change per second
  • Custom multiplier: 0.000000001

CPU: Utilization

The CPU utilization expressed in %.

Dependent item cockroachdb.cpu.util

Preprocessing

  • Prometheus pattern: VALUE(sys_cpu_combined_percent_normalized)

  • Custom multiplier: 100

Disk: IOPS in progress, rate

Number of disk IO operations currently in progress on this host.

Dependent item cockroachdb.disk.iops.in_progress.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_iopsinprogress)

  • Change per second
Disk: Reads, rate

Bytes read from all disks per second since this process started

Dependent item cockroachdb.disk.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_bytes)

  • Change per second
Disk: Read IOPS, rate

Number of disk read operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.read.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_read_count)

  • Change per second
Disk: Writes, rate

Bytes written to all disks per second since this process started.

Dependent item cockroachdb.disk.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_bytes)

  • Change per second
Disk: Write IOPS, rate

Disk write operations per second across all disks since this process started.

Dependent item cockroachdb.disk.iops.write.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_disk_write_count)

  • Change per second
File descriptors: Limit

Open file descriptors soft limit of the process.

Dependent item cockroachdb.descriptors.limit

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_softlimit)

  • Discard unchanged with heartbeat: 3h

File descriptors: Open

The number of open file descriptors.

Dependent item cockroachdb.descriptors.open

Preprocessing

  • Prometheus pattern: VALUE(sys_fd_open)

GC: Pause time

The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.

Dependent item cockroachdb.gc.pause_time

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_pause_ns)

  • Change per second
  • Custom multiplier: 0.000000001

GC: Runs, rate

The number of times that Go's garbage collector was invoked per second across all nodes.

Dependent item cockroachdb.gc.runs.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_gc_count)

  • Change per second
Go: Goroutines count

Current number of Goroutines. This count should rise and fall based on load.

Dependent item cockroachdb.go.goroutines.count

Preprocessing

  • Prometheus pattern: VALUE(sys_goroutines)

KV transactions: Aborted, rate

Number of aborted KV transactions per second.

Dependent item cockroachdb.kv.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_aborts)

  • Change per second
KV transactions: Committed, rate

Number of KV transactions (including 1PC) committed per second.

Dependent item cockroachdb.kv.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(txn_commits)

  • Change per second
Live nodes count

The number of live nodes in the cluster (will be 0 if this node is not itself live).

Dependent item cockroachdb.live_count

Preprocessing

  • Prometheus pattern: VALUE(liveness_livenodes)

  • Discard unchanged with heartbeat: 3h

Liveness heartbeats, rate

Number of successful node liveness heartbeats per second from this node.

Dependent item cockroachdb.heartbeaths.success.rate

Preprocessing

  • Prometheus pattern: VALUE(liveness_heartbeatsuccesses)

  • Change per second
Memory: Allocated by Cgo

Current bytes of memory allocated by the C layer.

Dependent item cockroachdb.memory.cgo.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_allocbytes)

Memory: Allocated by Go

Current bytes of memory allocated by the Go layer.

Dependent item cockroachdb.memory.go.allocated

Preprocessing

  • Prometheus pattern: VALUE(sys_go_allocbytes)

Memory: Managed by Cgo

Total bytes of memory managed by the C layer.

Dependent item cockroachdb.memory.cgo.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_cgo_totalbytes)

Memory: Managed by Go

Total bytes of memory managed by the Go layer.

Dependent item cockroachdb.memory.go.managed

Preprocessing

  • Prometheus pattern: VALUE(sys_go_totalbytes)

Memory: Total usage

Resident set size (RSS) of memory in use by the node.

Dependent item cockroachdb.memory.total

Preprocessing

  • Prometheus pattern: VALUE(sys_rss)

Network: Bytes received, rate

Bytes received per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_recv_bytes)

  • Change per second
Network: Bytes sent, rate

Bytes sent per second on all network interfaces since this process started.

Dependent item cockroachdb.network.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sys_host_net_send_bytes)

  • Change per second
Time series: Sample errors, rate

The number of errors encountered while attempting to write metrics to disk, per second.

Dependent item cockroachdb.ts.samples.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_errors)

  • Change per second
Time series: Samples written, rate

The number of successfully written metric samples per second.

Dependent item cockroachdb.ts.samples.written.rate

Preprocessing

  • Prometheus pattern: VALUE(timeseries_write_samples)

  • Change per second
Slow requests: DistSender RPCs

Number of RPCs stuck or retrying for a long time.

Dependent item cockroachdb.slow_requests.rpc

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_distsender)

SQL: Bytes received, rate

Total amount of incoming SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.received.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesin)

  • Change per second
SQL: Bytes sent, rate

Total amount of outgoing SQL client network traffic in bytes per second.

Dependent item cockroachdb.sql.bytes.sent.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_bytesout)

  • Change per second
Memory: Allocated by SQL

Current SQL statement memory usage for root.

Dependent item cockroachdb.memory.sql

Preprocessing

  • Prometheus pattern: VALUE(sql_mem_root_current)

SQL: Schema changes, rate

Total number of SQL DDL statements successfully executed per second.

Dependent item cockroachdb.sql.schema_changes.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_ddl_count)

  • Change per second
SQL sessions: Open

Total number of open SQL sessions.

Dependent item cockroachdb.sql.sessions

Preprocessing

  • Prometheus pattern: VALUE(sql_conns)

SQL statements: Active

Total number of SQL statements currently active.

Dependent item cockroachdb.sql.statements.active

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_queries_active)

SQL statements: DELETE, rate

A moving average of the number of DELETE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.delete.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_delete_count)

  • Change per second
SQL statements: Executed, rate

Number of SQL queries executed per second.

Dependent item cockroachdb.sql.statements.executed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_query_count)

  • Change per second
SQL statements: Denials, rate

The number of statements denied per second by a feature flag.

Dependent item cockroachdb.sql.statements.denials.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_feature_flag_denial)

  • Change per second
SQL statements: Active flows distributed, rate

The number of distributed SQL flows currently active per second.

Dependent item cockroachdb.sql.statements.flows.active.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_flows_active)

  • Change per second
SQL statements: INSERT, rate

A moving average of the number of INSERT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.insert.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_insert_count)

  • Change per second
SQL statements: SELECT, rate

A moving average of the number of SELECT statements successfully executed per second.

Dependent item cockroachdb.sql.statements.select.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_select_count)

  • Change per second
SQL statements: UPDATE, rate

A moving average of the number of UPDATE statements successfully executed per second.

Dependent item cockroachdb.sql.statements.update.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_update_count)

  • Change per second
SQL statements: Contention, rate

Total number of SQL statements that experienced contention per second.

Dependent item cockroachdb.sql.statements.contention.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_distsql_contended_queries_count)

  • Change per second
SQL statements: Errors, rate

Total number of statements which returned a planning or runtime error per second.

Dependent item cockroachdb.sql.statements.errors.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_failure_count)

  • Change per second
SQL transactions: Open

Total number of currently open SQL transactions.

Dependent item cockroachdb.sql.transactions.open

Preprocessing

  • Prometheus pattern: VALUE(sql_txns_open)

SQL transactions: Aborted, rate

Total number of SQL transaction abort errors per second.

Dependent item cockroachdb.sql.transactions.aborted.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_abort_count)

  • Change per second
SQL transactions: Committed, rate

Total number of SQL transaction COMMIT statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.committed.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_commit_count)

  • Change per second
SQL transactions: Initiated, rate

Total number of SQL transaction BEGIN statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.initiated.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_begin_count)

  • Change per second
SQL transactions: Rolled back, rate

Total number of SQL transaction ROLLBACK statements successfully executed per second.

Dependent item cockroachdb.sql.transactions.rollbacks.rate

Preprocessing

  • Prometheus pattern: VALUE(sql_txn_rollback_count)

  • Change per second
Uptime

Process uptime.

Dependent item cockroachdb.uptime

Preprocessing

  • Prometheus pattern: VALUE(sys_uptime)

Node certificate expiration date

Node certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.node

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_node)

    ??Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

CA certificate expiration date

CA certificate expires at that date.

Dependent item cockroachdb.cert.expire_date.ca

Preprocessing

  • Prometheus pattern: VALUE(security_certificate_expiration_ca)

    ??Custom on fail: Discard value

  • Discard unchanged with heartbeat: 6h

Triggers

Name Description Expression Severity Dependencies and additional info
CockroachDB: Node is unhealthy

Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.

last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Node is not ready

Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons:
- node is in the wait phase of the node shutdown sequence;
- node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.

last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m Average Depends on:
  • CockroachDB: Service is down
CockroachDB: Service is down last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]) = 0 Average
CockroachDB: Clock offset is too high

Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).

min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 Warning
CockroachDB: Version has changed last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 Info
CockroachDB: Current number of open files is too high

Getting close to open file descriptor limit.

min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} Warning
CockroachDB: Node is not executing SQL

Node is not executing SQL despite having connections.

last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 Warning
CockroachDB: SQL statements errors rate is too high min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} Warning
CockroachDB: Node has been restarted

Uptime is less than 10 minutes.

last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m Info
CockroachDB: Failed to fetch node data

Áú»¢¶Ä²© has not received data for items for the last 5 minutes.

nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 Warning Depends on:
  • CockroachDB: Service is down
CockroachDB: Node certificate expires soon

Node certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} Warning
CockroachDB: CA certificate expires soon

CA certificate expires soon.

(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info
Storage metrics discovery

Discover per store metrics.

Dependent item cockroachdb.store.discovery

Preprocessing

  • Prometheus to JSON: capacity

  • Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name Description Type Key and additional info
Storage [{#STORE}]: Bytes: Live

Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},live]

Preprocessing

  • Prometheus pattern: VALUE(livebytes{store="{#STORE}"})

Storage [{#STORE}]: Bytes: System

Number of physical bytes stored in system key-value pairs.

Dependent item cockroachdb.storage.bytes.[{#STORE},system]

Preprocessing

  • Prometheus pattern: VALUE(sysbytes{store="{#STORE}"})

Storage [{#STORE}]: Capacity available

Available storage capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},available]

Preprocessing

  • Prometheus pattern: VALUE(capacity_available{store="{#STORE}"})

Storage [{#STORE}]: Capacity total

Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.

Dependent item cockroachdb.storage.capacity.[{#STORE},total]

Preprocessing

  • Prometheus pattern: VALUE(capacity{store="{#STORE}"})

  • Discard unchanged with heartbeat: 3h

Storage [{#STORE}]: Capacity used

Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.

Dependent item cockroachdb.storage.capacity.[{#STORE},used]

Preprocessing

  • Prometheus pattern: VALUE(capacity_used{store="{#STORE}"})

Storage [{#STORE}]: Capacity available in %

Available storage capacity in %.

Calculated cockroachdb.storage.capacity.[{#STORE},available_percent]
Storage [{#STORE}]: Replication: Lease holders

Number of lease holders.

Dependent item cockroachdb.replication.[{#STORE},lease_holders]

Preprocessing

  • Prometheus pattern: VALUE(replicas_leaseholders{store="{#STORE}"})

Storage [{#STORE}]: Bytes: Logical

Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.

Dependent item cockroachdb.storage.bytes.[{#STORE},logical]

Preprocessing

  • Prometheus pattern: VALUE(totalbytes{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average queries, rate

Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.queries.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_queriespersecond{store="{#STORE}"})

Storage [{#STORE}]: Rebalancing: Average writes, rate

Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.

Dependent item cockroachdb.rebalancing.writes.average.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rebalancing_writespersecond{store="{#STORE}"})

Storage [{#STORE}]: Queue processing failures: Consistency, rate

Number of replicas which failed processing in the consistency checker queue per second.

Dependent item cockroachdb.queue.processing_failures.consistency.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_consistency_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: GC, rate

Number of replicas which failed processing in the GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_gc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft log, rate

Number of replicas which failed processing in the Raft log queue per second.

Dependent item cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftlog_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate

Number of replicas which failed processing in the Raft repair queue per second.

Dependent item cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replica GC, rate

Number of replicas which failed processing in the replica GC queue per second.

Dependent item cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicagc_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Replicate, rate

Number of replicas which failed processing in the replicate queue per second.

Dependent item cockroachdb.queue.processing_failures.replicate.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_replicate_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Split, rate

Number of replicas which failed processing in the split queue per second.

Dependent item cockroachdb.queue.processing_failures.split.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_split_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate

Number of replicas which failed processing in the time series maintenance queue per second.

Dependent item cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: Ranges count

Number of ranges.

Dependent item cockroachdb.ranges.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(ranges{store="{#STORE}"})

Storage [{#STORE}]: Ranges unavailable

Number of ranges with fewer live replicas than needed for quorum.

Dependent item cockroachdb.ranges.[{#STORE},unavailable]

Preprocessing

  • Prometheus pattern: VALUE(ranges_unavailable{store="{#STORE}"})

Storage [{#STORE}]: Ranges underreplicated

Number of ranges with fewer live replicas than the replication target.

Dependent item cockroachdb.ranges.[{#STORE},underreplicated]

Preprocessing

  • Prometheus pattern: VALUE(ranges_underreplicated{store="{#STORE}"})

Storage [{#STORE}]: RocksDB read amplification

The average number of real read operations executed per logical read operation.

Dependent item cockroachdb.rocksdb.[{#STORE},read_amp]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_read_amplification{store="{#STORE}"})

Storage [{#STORE}]: RocksDB cache hits, rate

Count of block cache hits per second.

Dependent item cockroachdb.rocksdb.cache.hits.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_hits{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache misses, rate

Count of block cache misses per second.

Dependent item cockroachdb.rocksdb.cache.misses.[{#STORE},rate]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_block_cache_misses{store="{#STORE}"})

  • Change per second
Storage [{#STORE}]: RocksDB cache hit ratio

Block cache hit ratio in %.

Calculated cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
Storage [{#STORE}]: Replication: Replicas

Number of replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},count]

Preprocessing

  • Prometheus pattern: VALUE(replicas{store="{#STORE}"})

Storage [{#STORE}]: Replication: Replicas quiesced

Number of quiesced replicas.

Dependent item cockroachdb.replication.replicas.[{#STORE},quiesced]

Preprocessing

  • Prometheus pattern: VALUE(replicas_quiescent{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Latch acquisitions

Number of requests that have been stuck for a long time acquiring latches.

Dependent item cockroachdb.slow_requests.[{#STORE},latch_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_latch{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Lease acquisitions

Number of requests that have been stuck for a long time acquiring a lease.

Dependent item cockroachdb.slow_requests.[{#STORE},lease_acquisitions]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_lease{store="{#STORE}"})

Storage [{#STORE}]: Slow requests: Raft proposals

Number of requests that have been stuck for a long time in raft.

Dependent item cockroachdb.slow_requests.[{#STORE},raft_proposals]

Preprocessing

  • Prometheus pattern: VALUE(requests_slow_raft{store="{#STORE}"})

Storage [{#STORE}]: RocksDB SSTables

The number of SSTables in use.

Dependent item cockroachdb.rocksdb.[{#STORE},sstables]

Preprocessing

  • Prometheus pattern: VALUE(rocksdb_num_sstables{store="{#STORE}"})

Trigger prototypes for Storage metrics discovery

Name Description Expression Severity Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low

Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Warning Depends on:
  • CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low

Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).

max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Average

Feedback

Please report any issues with the template at

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

Articles and documentation

+ Propose new article

Sie k?nnen die Integration nicht finden, die Sie ben?tigen?