Source:
Etcd by HTTP
Overview
This template is designed to monitor etcd
by Áú»¢¶Ä²© that works without any external scripts.
Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.
The template Etcd by HTTP
¡ª collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the .
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details on . Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Requirements
Áú»¢¶Ä²© version: 7.2 and higher.
Tested versions
This template has been tested on:
- Etcd 3.5.6
Configuration
Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.
Setup
-
Make sure that
etcd
allows the collection of metrics. You can test it by running:curl -L http://localhost:2379/metrics
. -
Check if
etcd
is accessible from Áú»¢¶Ä²© proxy or Áú»¢¶Ä²© server depending on where you are planning to do the monitoring. To verify it, runcurl -L http://<etcd_node_address>:2379/metrics
. -
Add the template to the
etcd
node. Set the hostname or IP address of theetcd
host in the{$ETCD.HOST}
macro. By default, the template uses a client's port. You can configure metrics endpoint location by adding--listen-metrics-urls
flag.
For more details, see the .
Additional points to consider:
- If you have specified a non-standard port for
etcd
, don't forget to change macros:{$ETCD.SCHEME}
and{$ETCD.PORT}
. - You can set
{$ETCD.USERNAME}
and{$ETCD.PASSWORD}
macros in the template to use on a host level if necessary. - To test availability, run:
zabbix_get -s etcd-host -k etcd.health
. - See the macros section, as it will set the trigger values.
Macros used
Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on . |
.* |
{$ETCD.GRPC_CODE.NOT_MATCHES} | The filter to exclude discovered gRPC codes. See more details on . |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Get node metrics | HTTP agent | etcd.get_metrics | |
Node health | HTTP agent | etcd.health Preprocessing
|
|
Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Get version | HTTP agent | etcd.get_version | |
Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |
Average | Manual close: Yes | |
Etcd: Node healthcheck failed | See more details on . |
last(/Etcd by HTTP/etcd.health)=0 |
Average | Depends on:
|
Etcd: Failed to fetch info data | Áú»¢¶Ä²© has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
Warning | Manual close: Yes Depends on:
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
Average | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
Warning | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
Warning | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
Warning | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
Warning | |
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
Info | Manual close: Yes |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
Info | Manual close: Yes |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
Info | Manual close: Yes |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
Warning |
LLD rule gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Item prototypes for gRPC codes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Trigger prototypes for gRPC codes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
Warning |
LLD rule Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Item prototypes for Peers discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Feedback
Please report any issues with the template at
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums