Source:
Envoy Proxy by HTTP
Overview
The template to monitor Envoy Proxy by Áú»¢¶Ä²© that works without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.
Template Envoy Proxy by HTTP
- collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
Requirements
Áú»¢¶Ä²© version: 7.2 and higher.
Tested versions
This template has been tested on:
- Envoy Proxy 1.20.2
Configuration
Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.
Setup
Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.
Macros used
Name | Description | Default |
---|---|---|
{$ENVOY.URL} | Instance URL. |
http://localhost:9901 |
{$ENVOY.METRICS.PATH} | The path Áú»¢¶Ä²© will scrape metrics in prometheus format from. |
/stats/prometheus |
{$ENVOY.CERT.MIN} | Minimum number of days before certificate expiration used for trigger expression. |
7 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get node metrics | Get server metrics. |
HTTP agent | envoy.get_metrics Preprocessing
|
Server state | State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS). |
Dependent item | envoy.server.state Preprocessing
|
Server live | 1 if the server is not currently draining, 0 otherwise. |
Dependent item | envoy.server.live Preprocessing
|
Uptime | Current server uptime in seconds. |
Dependent item | envoy.server.uptime Preprocessing
|
Certificate expiration, day before | Number of days until the next certificate being managed will expire. |
Dependent item | envoy.server.days_until_first_cert_expiring Preprocessing
|
Server concurrency | Number of worker threads. |
Dependent item | envoy.server.concurrency Preprocessing
|
Memory allocated | Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart. |
Dependent item | envoy.server.memory_allocated Preprocessing
|
Memory heap size | Current reserved heap size in bytes. New Envoy process heap size on hot restart. |
Dependent item | envoy.server.memory_heap_size Preprocessing
|
Memory physical size | Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart. |
Dependent item | envoy.server.memory_physical_size Preprocessing
|
Filesystem, flushed by timer rate | Total number of times internal flush buffers are written to a file due to flush timeout per second. |
Dependent item | envoy.filesystem.flushed_by_timer.rate Preprocessing
|
Filesystem, write completed rate | Total number of times a file was written per second. |
Dependent item | envoy.filesystem.write_completed.rate Preprocessing
|
Filesystem, write failed rate | Total number of times an error occurred during a file write operation per second. |
Dependent item | envoy.filesystem.write_failed.rate Preprocessing
|
Filesystem, reopen failed rate | Total number of times a file was failed to be opened per second. |
Dependent item | envoy.filesystem.reopen_failed.rate Preprocessing
|
Connections, total | Total connections of both new and old Envoy processes. |
Dependent item | envoy.server.total_connections Preprocessing
|
Connections, parent | Total connections of the old Envoy process on hot restart. |
Dependent item | envoy.server.parent_connections Preprocessing
|
Clusters, warming | Number of currently warming (not active) clusters. |
Dependent item | envoy.cluster_manager.warming_clusters Preprocessing
|
Clusters, active | Number of currently active (warmed) clusters. |
Dependent item | envoy.cluster_manager.active_clusters Preprocessing
|
Clusters, added rate | Total clusters added (either via static config or CDS) per second. |
Dependent item | envoy.cluster_manager.cluster_added.rate Preprocessing
|
Clusters, modified rate | Total clusters modified (via CDS) per second. |
Dependent item | envoy.cluster_manager.cluster_modified.rate Preprocessing
|
Clusters, removed rate | Total clusters removed (via CDS) per second. |
Dependent item | envoy.cluster_manager.cluster_removed.rate Preprocessing
|
Clusters, updates rate | Total cluster updates per second. |
Dependent item | envoy.cluster_manager.cluster_updated.rate Preprocessing
|
Listeners, active | Number of currently active listeners. |
Dependent item | envoy.listener_manager.total_listeners_active Preprocessing
|
Listeners, draining | Number of currently draining listeners. |
Dependent item | envoy.listener_manager.total_listeners_draining Preprocessing
|
Listener, warming | Number of currently warming listeners. |
Dependent item | envoy.listener_manager.total_listeners_warming Preprocessing
|
Listener manager, initialized | A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers. |
Dependent item | envoy.listener_manager.workers_started Preprocessing
|
Listeners, create failure | Total failed listener object additions to workers per second. |
Dependent item | envoy.listener_manager.listener_create_failure.rate Preprocessing
|
Listeners, create success | Total listener objects successfully added to workers per second. |
Dependent item | envoy.listener_manager.listener_create_success.rate Preprocessing
|
Listeners, added | Total listeners added (either via static config or LDS) per second. |
Dependent item | envoy.listener_manager.listener_added.rate Preprocessing
|
Listeners, stopped | Total listeners stopped per second. |
Dependent item | envoy.listener_manager.listener_stopped.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: Server state is not live | last(/Envoy Proxy by HTTP/envoy.server.state) > 0 |
Average | ||
Envoy Proxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m |
Info | Manual close: Yes |
Envoy Proxy: Failed to fetch metrics data | Áú»¢¶Ä²© has not received data for items for the last 10 minutes. |
nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 |
Warning | Manual close: Yes |
Envoy Proxy: SSL certificate expires soon | Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire. |
last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} |
Warning |
LLD rule Cluster metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | Dependent item | envoy.lld.cluster Preprocessing
|
Item prototypes for Cluster metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster ["{#CLUSTER_NAME}"]: Membership, total | Current cluster membership total. |
Dependent item | envoy.cluster.membership_total["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Membership, healthy | Current cluster healthy total (inclusive of both health checking and outlier detection). |
Dependent item | envoy.cluster.membership_healthy["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy | Current cluster unhealthy. |
Calculated | envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"] |
Cluster ["{#CLUSTER_NAME}"]: Membership, degraded | Current cluster degraded total. |
Dependent item | envoy.cluster.membership_degraded["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Connections, total | Current cluster total connections. |
Dependent item | envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Connections, active | Current cluster total active connections. |
Dependent item | envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests total, rate | Current cluster request total per second. |
Dependent item | envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate | Current cluster requests that timed out waiting for a response per second. |
Dependent item | envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate | Total upstream requests completed per second. |
Dependent item | envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests pending | Total active requests pending a connection pool connection. |
Dependent item | envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests active | Total active requests. |
Dependent item | envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate | Total sent connection bytes per second. |
Dependent item | envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate | Total received connection bytes per second. |
Dependent item | envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing
|
Trigger prototypes for Cluster metrics discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: There are unhealthy clusters | last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 |
Average |
LLD rule Listeners metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Listeners metrics discovery | Dependent item | envoy.lld.listeners Preprocessing
|
Item prototypes for Listeners metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Listener ["{#LISTENER_ADDRESS}"]: Connections, active | Total active connections. |
Dependent item | envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"] Preprocessing
|
Listener ["{#LISTENER_ADDRESS}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"] Preprocessing
|
Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing | Sockets currently undergoing listener filter processing. |
Dependent item | envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"] Preprocessing
|
LLD rule HTTP metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP metrics discovery | Dependent item | envoy.lld.http Preprocessing
|
Item prototypes for HTTP metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP ["{#CONN_MANAGER}"]: Requests, rate | Total active connections per second. |
Dependent item | envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Requests, active | Total active requests. |
Dependent item | envoy.http.downstream_rq_active["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate | Total requests closed due to a timeout on the request path per second. |
Dependent item | envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.http.downstream_cx_total["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Connections, active | Total active connections. |
Dependent item | envoy.http.downstream_cx_active["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Bytes in, rate | Total bytes received per second. |
Dependent item | envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Bytes out, rate | Total bytes sent per second. |
Dependent item | envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"] Preprocessing
|
Feedback
Please report any issues with the template at
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums