Source:
CockroachDB by HTTP
Overview
The template to monitor CockroachDB nodes by Áú»¢¶Ä²© that works without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.
The template collects metrics by HTTP agent from Prometheus endpoint and health endpoints.
Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. The template doesn't require usage of session token.
Note, that some metrics may not be collected depending on your CockroachDB version and configuration.
Requirements
Áú»¢¶Ä²© version: 7.2 and higher.
Tested versions
This template has been tested on:
- CockroachDB 21.2.8
Configuration
Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.
Setup
Set the hostname or IP address of the CockroachDB node host in the {$COCKROACHDB.API.HOST}
macro. You can also change the port in the {$COCKROACHDB.API.PORT}
macro and the scheme in the {$COCKROACHDB.API.SCHEME}
macro if necessary.
Also, see the Macros section for a list of macros used to set trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$COCKROACHDB.API.HOST} | The hostname or IP address of the CockroachDB host. |
<SET COCKROACHDB HOST> |
{$COCKROACHDB.API.PORT} | The port of CockroachDB API and Prometheus endpoint. |
8080 |
{$COCKROACHDB.API.SCHEME} | Request scheme which may be http or https. |
http |
{$COCKROACHDB.STORE.USED.MIN.WARN} | The warning threshold of the available disk space in percent. |
20 |
{$COCKROACHDB.STORE.USED.MIN.CRIT} | The critical threshold of the available disk space in percent. |
10 |
{$COCKROACHDB.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
80 |
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN} | Number of days until the node certificate expires. |
30 |
{$COCKROACHDB.CERT.CA.EXPIRY.WARN} | Number of days until the CA certificate expires. |
90 |
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} | Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression. |
300 |
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} | Maximum number of SQL statements errors for trigger expression. |
2 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get metrics | Get raw metrics from the Prometheus endpoint. |
HTTP agent | cockroachdb.get_metrics Preprocessing
|
Get health | Get node /health endpoint |
HTTP agent | cockroachdb.get_health Preprocessing
|
Get readiness | Get node /health?ready=1 endpoint |
HTTP agent | cockroachdb.get_readiness Preprocessing
|
Service ping | Check if HTTP/HTTPS service accepts TCP connections. |
Simple check | net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"] Preprocessing
|
Clock offset | Mean clock offset of the node against the rest of the cluster. |
Dependent item | cockroachdb.clock.offset Preprocessing
|
Version | Build information. |
Dependent item | cockroachdb.version Preprocessing
|
CPU: System time | System CPU time. |
Dependent item | cockroachdb.cpu.system_time Preprocessing
|
CPU: User time | User CPU time. |
Dependent item | cockroachdb.cpu.user_time Preprocessing
|
CPU: Utilization | The CPU utilization expressed in %. |
Dependent item | cockroachdb.cpu.util Preprocessing
|
Disk: IOPS in progress, rate | Number of disk IO operations currently in progress on this host. |
Dependent item | cockroachdb.disk.iops.in_progress.rate Preprocessing
|
Disk: Reads, rate | Bytes read from all disks per second since this process started |
Dependent item | cockroachdb.disk.read.rate Preprocessing
|
Disk: Read IOPS, rate | Number of disk read operations per second across all disks since this process started. |
Dependent item | cockroachdb.disk.iops.read.rate Preprocessing
|
Disk: Writes, rate | Bytes written to all disks per second since this process started. |
Dependent item | cockroachdb.disk.write.rate Preprocessing
|
Disk: Write IOPS, rate | Disk write operations per second across all disks since this process started. |
Dependent item | cockroachdb.disk.iops.write.rate Preprocessing
|
File descriptors: Limit | Open file descriptors soft limit of the process. |
Dependent item | cockroachdb.descriptors.limit Preprocessing
|
File descriptors: Open | The number of open file descriptors. |
Dependent item | cockroachdb.descriptors.open Preprocessing
|
GC: Pause time | The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused. |
Dependent item | cockroachdb.gc.pause_time Preprocessing
|
GC: Runs, rate | The number of times that Go's garbage collector was invoked per second across all nodes. |
Dependent item | cockroachdb.gc.runs.rate Preprocessing
|
Go: Goroutines count | Current number of Goroutines. This count should rise and fall based on load. |
Dependent item | cockroachdb.go.goroutines.count Preprocessing
|
KV transactions: Aborted, rate | Number of aborted KV transactions per second. |
Dependent item | cockroachdb.kv.transactions.aborted.rate Preprocessing
|
KV transactions: Committed, rate | Number of KV transactions (including 1PC) committed per second. |
Dependent item | cockroachdb.kv.transactions.committed.rate Preprocessing
|
Live nodes count | The number of live nodes in the cluster (will be 0 if this node is not itself live). |
Dependent item | cockroachdb.live_count Preprocessing
|
Liveness heartbeats, rate | Number of successful node liveness heartbeats per second from this node. |
Dependent item | cockroachdb.heartbeaths.success.rate Preprocessing
|
Memory: Allocated by Cgo | Current bytes of memory allocated by the C layer. |
Dependent item | cockroachdb.memory.cgo.allocated Preprocessing
|
Memory: Allocated by Go | Current bytes of memory allocated by the Go layer. |
Dependent item | cockroachdb.memory.go.allocated Preprocessing
|
Memory: Managed by Cgo | Total bytes of memory managed by the C layer. |
Dependent item | cockroachdb.memory.cgo.managed Preprocessing
|
Memory: Managed by Go | Total bytes of memory managed by the Go layer. |
Dependent item | cockroachdb.memory.go.managed Preprocessing
|
Memory: Total usage | Resident set size (RSS) of memory in use by the node. |
Dependent item | cockroachdb.memory.total Preprocessing
|
Network: Bytes received, rate | Bytes received per second on all network interfaces since this process started. |
Dependent item | cockroachdb.network.bytes.received.rate Preprocessing
|
Network: Bytes sent, rate | Bytes sent per second on all network interfaces since this process started. |
Dependent item | cockroachdb.network.bytes.sent.rate Preprocessing
|
Time series: Sample errors, rate | The number of errors encountered while attempting to write metrics to disk, per second. |
Dependent item | cockroachdb.ts.samples.errors.rate Preprocessing
|
Time series: Samples written, rate | The number of successfully written metric samples per second. |
Dependent item | cockroachdb.ts.samples.written.rate Preprocessing
|
Slow requests: DistSender RPCs | Number of RPCs stuck or retrying for a long time. |
Dependent item | cockroachdb.slow_requests.rpc Preprocessing
|
SQL: Bytes received, rate | Total amount of incoming SQL client network traffic in bytes per second. |
Dependent item | cockroachdb.sql.bytes.received.rate Preprocessing
|
SQL: Bytes sent, rate | Total amount of outgoing SQL client network traffic in bytes per second. |
Dependent item | cockroachdb.sql.bytes.sent.rate Preprocessing
|
Memory: Allocated by SQL | Current SQL statement memory usage for root. |
Dependent item | cockroachdb.memory.sql Preprocessing
|
SQL: Schema changes, rate | Total number of SQL DDL statements successfully executed per second. |
Dependent item | cockroachdb.sql.schema_changes.rate Preprocessing
|
SQL sessions: Open | Total number of open SQL sessions. |
Dependent item | cockroachdb.sql.sessions Preprocessing
|
SQL statements: Active | Total number of SQL statements currently active. |
Dependent item | cockroachdb.sql.statements.active Preprocessing
|
SQL statements: DELETE, rate | A moving average of the number of DELETE statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.delete.rate Preprocessing
|
SQL statements: Executed, rate | Number of SQL queries executed per second. |
Dependent item | cockroachdb.sql.statements.executed.rate Preprocessing
|
SQL statements: Denials, rate | The number of statements denied per second by a feature flag. |
Dependent item | cockroachdb.sql.statements.denials.rate Preprocessing
|
SQL statements: Active flows distributed, rate | The number of distributed SQL flows currently active per second. |
Dependent item | cockroachdb.sql.statements.flows.active.rate Preprocessing
|
SQL statements: INSERT, rate | A moving average of the number of INSERT statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.insert.rate Preprocessing
|
SQL statements: SELECT, rate | A moving average of the number of SELECT statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.select.rate Preprocessing
|
SQL statements: UPDATE, rate | A moving average of the number of UPDATE statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.update.rate Preprocessing
|
SQL statements: Contention, rate | Total number of SQL statements that experienced contention per second. |
Dependent item | cockroachdb.sql.statements.contention.rate Preprocessing
|
SQL statements: Errors, rate | Total number of statements which returned a planning or runtime error per second. |
Dependent item | cockroachdb.sql.statements.errors.rate Preprocessing
|
SQL transactions: Open | Total number of currently open SQL transactions. |
Dependent item | cockroachdb.sql.transactions.open Preprocessing
|
SQL transactions: Aborted, rate | Total number of SQL transaction abort errors per second. |
Dependent item | cockroachdb.sql.transactions.aborted.rate Preprocessing
|
SQL transactions: Committed, rate | Total number of SQL transaction COMMIT statements successfully executed per second. |
Dependent item | cockroachdb.sql.transactions.committed.rate Preprocessing
|
SQL transactions: Initiated, rate | Total number of SQL transaction BEGIN statements successfully executed per second. |
Dependent item | cockroachdb.sql.transactions.initiated.rate Preprocessing
|
SQL transactions: Rolled back, rate | Total number of SQL transaction ROLLBACK statements successfully executed per second. |
Dependent item | cockroachdb.sql.transactions.rollbacks.rate Preprocessing
|
Uptime | Process uptime. |
Dependent item | cockroachdb.uptime Preprocessing
|
Node certificate expiration date | Node certificate expires at that date. |
Dependent item | cockroachdb.cert.expire_date.node Preprocessing
|
CA certificate expiration date | CA certificate expires at that date. |
Dependent item | cockroachdb.cert.expire_date.ca Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
CockroachDB: Node is unhealthy | Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode. |
last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 |
Average | Depends on:
|
CockroachDB: Node is not ready | Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons: |
last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m |
Average | Depends on:
|
CockroachDB: Service is down | last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{$COCKROACHDB.API.HOST}","{$COCKROACHDB.API.PORT}"]) = 0 |
Average | ||
CockroachDB: Clock offset is too high | Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean). |
min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 |
Warning | |
CockroachDB: Version has changed | last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 |
Info | ||
CockroachDB: Current number of open files is too high | Getting close to open file descriptor limit. |
min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} |
Warning | |
CockroachDB: Node is not executing SQL | Node is not executing SQL despite having connections. |
last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 |
Warning | |
CockroachDB: SQL statements errors rate is too high | min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} |
Warning | ||
CockroachDB: Node has been restarted | Uptime is less than 10 minutes. |
last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m |
Info | |
CockroachDB: Failed to fetch node data | Áú»¢¶Ä²© has not received data for items for the last 5 minutes. |
nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 |
Warning | Depends on:
|
CockroachDB: Node certificate expires soon | Node certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} |
Warning | |
CockroachDB: CA certificate expires soon | CA certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} |
Warning |
LLD rule Storage metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Discover per store metrics. |
Dependent item | cockroachdb.store.discovery Preprocessing
|
Item prototypes for Storage metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#STORE}]: Bytes: Live | Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data. |
Dependent item | cockroachdb.storage.bytes.[{#STORE},live] Preprocessing
|
Storage [{#STORE}]: Bytes: System | Number of physical bytes stored in system key-value pairs. |
Dependent item | cockroachdb.storage.bytes.[{#STORE},system] Preprocessing
|
Storage [{#STORE}]: Capacity available | Available storage capacity. |
Dependent item | cockroachdb.storage.capacity.[{#STORE},available] Preprocessing
|
Storage [{#STORE}]: Capacity total | Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity. |
Dependent item | cockroachdb.storage.capacity.[{#STORE},total] Preprocessing
|
Storage [{#STORE}]: Capacity used | Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files. |
Dependent item | cockroachdb.storage.capacity.[{#STORE},used] Preprocessing
|
Storage [{#STORE}]: Capacity available in % | Available storage capacity in %. |
Calculated | cockroachdb.storage.capacity.[{#STORE},available_percent] |
Storage [{#STORE}]: Replication: Lease holders | Number of lease holders. |
Dependent item | cockroachdb.replication.[{#STORE},lease_holders] Preprocessing
|
Storage [{#STORE}]: Bytes: Logical | Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data. |
Dependent item | cockroachdb.storage.bytes.[{#STORE},logical] Preprocessing
|
Storage [{#STORE}]: Rebalancing: Average queries, rate | Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions. |
Dependent item | cockroachdb.rebalancing.queries.average.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Rebalancing: Average writes, rate | Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions. |
Dependent item | cockroachdb.rebalancing.writes.average.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Consistency, rate | Number of replicas which failed processing in the consistency checker queue per second. |
Dependent item | cockroachdb.queue.processing_failures.consistency.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: GC, rate | Number of replicas which failed processing in the GC queue per second. |
Dependent item | cockroachdb.queue.processing_failures.gc.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Raft log, rate | Number of replicas which failed processing in the Raft log queue per second. |
Dependent item | cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate | Number of replicas which failed processing in the Raft repair queue per second. |
Dependent item | cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Replica GC, rate | Number of replicas which failed processing in the replica GC queue per second. |
Dependent item | cockroachdb.queue.processing_failures.gc_replica.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Replicate, rate | Number of replicas which failed processing in the replicate queue per second. |
Dependent item | cockroachdb.queue.processing_failures.replicate.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Split, rate | Number of replicas which failed processing in the split queue per second. |
Dependent item | cockroachdb.queue.processing_failures.split.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate | Number of replicas which failed processing in the time series maintenance queue per second. |
Dependent item | cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: Ranges count | Number of ranges. |
Dependent item | cockroachdb.ranges.[{#STORE},count] Preprocessing
|
Storage [{#STORE}]: Ranges unavailable | Number of ranges with fewer live replicas than needed for quorum. |
Dependent item | cockroachdb.ranges.[{#STORE},unavailable] Preprocessing
|
Storage [{#STORE}]: Ranges underreplicated | Number of ranges with fewer live replicas than the replication target. |
Dependent item | cockroachdb.ranges.[{#STORE},underreplicated] Preprocessing
|
Storage [{#STORE}]: RocksDB read amplification | The average number of real read operations executed per logical read operation. |
Dependent item | cockroachdb.rocksdb.[{#STORE},read_amp] Preprocessing
|
Storage [{#STORE}]: RocksDB cache hits, rate | Count of block cache hits per second. |
Dependent item | cockroachdb.rocksdb.cache.hits.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: RocksDB cache misses, rate | Count of block cache misses per second. |
Dependent item | cockroachdb.rocksdb.cache.misses.[{#STORE},rate] Preprocessing
|
Storage [{#STORE}]: RocksDB cache hit ratio | Block cache hit ratio in %. |
Calculated | cockroachdb.rocksdb.cache.[{#STORE},hit_ratio] |
Storage [{#STORE}]: Replication: Replicas | Number of replicas. |
Dependent item | cockroachdb.replication.replicas.[{#STORE},count] Preprocessing
|
Storage [{#STORE}]: Replication: Replicas quiesced | Number of quiesced replicas. |
Dependent item | cockroachdb.replication.replicas.[{#STORE},quiesced] Preprocessing
|
Storage [{#STORE}]: Slow requests: Latch acquisitions | Number of requests that have been stuck for a long time acquiring latches. |
Dependent item | cockroachdb.slow_requests.[{#STORE},latch_acquisitions] Preprocessing
|
Storage [{#STORE}]: Slow requests: Lease acquisitions | Number of requests that have been stuck for a long time acquiring a lease. |
Dependent item | cockroachdb.slow_requests.[{#STORE},lease_acquisitions] Preprocessing
|
Storage [{#STORE}]: Slow requests: Raft proposals | Number of requests that have been stuck for a long time in raft. |
Dependent item | cockroachdb.slow_requests.[{#STORE},raft_proposals] Preprocessing
|
Storage [{#STORE}]: RocksDB SSTables | The number of SSTables in use. |
Dependent item | cockroachdb.rocksdb.[{#STORE},sstables] Preprocessing
|
Trigger prototypes for Storage metrics discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
CockroachDB: Storage [{#STORE}]: Available storage capacity is low | Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} |
Warning | Depends on:
|
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low | Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} |
Average |
Feedback
Please report any issues with the template at
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums