Áú»¢¶Ä²©

Envoy Proxy

Envoy is an open source edge and service proxy, designed for cloud-native applications.

Dostupn¨¢ ?±ð?±ð²Ô¨ª




This template is for Áú»¢¶Ä²© version: 7.2
Also available for: 7.0 6.4 6.2 6.0

Source:

Envoy Proxy by HTTP

Overview

The template to monitor Envoy Proxy by Áú»¢¶Ä²© that works without any external scripts. Most of the metrics are collected in one go, thanks to Áú»¢¶Ä²© bulk data collection.

Template Envoy Proxy by HTTP - collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

Requirements

Áú»¢¶Ä²© version: 7.2 and higher.

Tested versions

This template has been tested on:

  • Envoy Proxy 1.20.2

Configuration

Áú»¢¶Ä²© should be configured according to the instructions in the Templates out of the box section.

Setup

Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).

Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.

Macros used

Name Description Default
{$ENVOY.URL}

Instance URL.

http://localhost:9901
{$ENVOY.METRICS.PATH}

The path Áú»¢¶Ä²© will scrape metrics in prometheus format from.

/stats/prometheus
{$ENVOY.CERT.MIN}

Minimum number of days before certificate expiration used for trigger expression.

7

Items

Name Description Type Key and additional info
Get node metrics

Get server metrics.

HTTP agent envoy.get_metrics

Preprocessing

  • Check for not supported value: any error

    ??Custom on fail: Discard value

Server state

State of the server.

Live - (default) Server is live and serving traffic.

Draining - Server is draining listeners in response to external health checks failing.

Pre initializing - Server has not yet completed cluster manager initialization.

Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS).

Dependent item envoy.server.state

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_state)

  • Discard unchanged with heartbeat: 3h

Server live

1 if the server is not currently draining, 0 otherwise.

Dependent item envoy.server.live

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_live)

  • Discard unchanged with heartbeat: 3h

Uptime

Current server uptime in seconds.

Dependent item envoy.server.uptime

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_uptime)

    ??Custom on fail: Discard value

Certificate expiration, day before

Number of days until the next certificate being managed will expire.

Dependent item envoy.server.days_until_first_cert_expiring

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_days_until_first_cert_expiring)

Server concurrency

Number of worker threads.

Dependent item envoy.server.concurrency

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_concurrency)

Memory allocated

Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart.

Dependent item envoy.server.memory_allocated

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_memory_allocated)

Memory heap size

Current reserved heap size in bytes. New Envoy process heap size on hot restart.

Dependent item envoy.server.memory_heap_size

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_memory_heap_size)

Memory physical size

Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart.

Dependent item envoy.server.memory_physical_size

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_memory_physical_size)

Filesystem, flushed by timer rate

Total number of times internal flush buffers are written to a file due to flush timeout per second.

Dependent item envoy.filesystem.flushed_by_timer.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_flushed_by_timer)

  • Change per second
Filesystem, write completed rate

Total number of times a file was written per second.

Dependent item envoy.filesystem.write_completed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_write_completed)

  • Change per second
Filesystem, write failed rate

Total number of times an error occurred during a file write operation per second.

Dependent item envoy.filesystem.write_failed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_write_failed)

  • Change per second
Filesystem, reopen failed rate

Total number of times a file was failed to be opened per second.

Dependent item envoy.filesystem.reopen_failed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_filesystem_reopen_failed)

  • Change per second
Connections, total

Total connections of both new and old Envoy processes.

Dependent item envoy.server.total_connections

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_total_connections)

Connections, parent

Total connections of the old Envoy process on hot restart.

Dependent item envoy.server.parent_connections

Preprocessing

  • Prometheus pattern: VALUE(envoy_server_parent_connections)

Clusters, warming

Number of currently warming (not active) clusters.

Dependent item envoy.cluster_manager.warming_clusters

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_warming_clusters)

Clusters, active

Number of currently active (warmed) clusters.

Dependent item envoy.cluster_manager.active_clusters

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_active_clusters)

Clusters, added rate

Total clusters added (either via static config or CDS) per second.

Dependent item envoy.cluster_manager.cluster_added.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_added)

  • Change per second
Clusters, modified rate

Total clusters modified (via CDS) per second.

Dependent item envoy.cluster_manager.cluster_modified.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_modified)

  • Change per second
Clusters, removed rate

Total clusters removed (via CDS) per second.

Dependent item envoy.cluster_manager.cluster_removed.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_removed)

  • Change per second
Clusters, updates rate

Total cluster updates per second.

Dependent item envoy.cluster_manager.cluster_updated.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_cluster_manager_cluster_updated)

  • Change per second
Listeners, active

Number of currently active listeners.

Dependent item envoy.listener_manager.total_listeners_active

Preprocessing

  • Prometheus pattern: SUM(envoy_listener_manager_total_listeners_active)

Listeners, draining

Number of currently draining listeners.

Dependent item envoy.listener_manager.total_listeners_draining

Preprocessing

  • Prometheus pattern: SUM(envoy_listener_manager_total_listeners_draining)

Listener, warming

Number of currently warming listeners.

Dependent item envoy.listener_manager.total_listeners_warming

Preprocessing

  • Prometheus pattern: SUM(envoy_listener_manager_total_listeners_warming)

Listener manager, initialized

A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers.

Dependent item envoy.listener_manager.workers_started

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_workers_started)

  • Discard unchanged with heartbeat: 3h

Listeners, create failure

Total failed listener object additions to workers per second.

Dependent item envoy.listener_manager.listener_create_failure.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_create_failure)

  • Change per second
Listeners, create success

Total listener objects successfully added to workers per second.

Dependent item envoy.listener_manager.listener_create_success.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_create_success)

  • Change per second
Listeners, added

Total listeners added (either via static config or LDS) per second.

Dependent item envoy.listener_manager.listener_added.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_added)

  • Change per second
Listeners, stopped

Total listeners stopped per second.

Dependent item envoy.listener_manager.listener_stopped.rate

Preprocessing

  • Prometheus pattern: VALUE(envoy_listener_manager_listener_stopped)

  • Change per second

Triggers

Name Description Expression Severity Dependencies and additional info
Envoy Proxy: Server state is not live last(/Envoy Proxy by HTTP/envoy.server.state) > 0 Average
Envoy Proxy: Service has been restarted

Uptime is less than 10 minutes.

last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m Info Manual close: Yes
Envoy Proxy: Failed to fetch metrics data

Áú»¢¶Ä²© has not received data for items for the last 10 minutes.

nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 Warning Manual close: Yes
Envoy Proxy: SSL certificate expires soon

Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire.

last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} Warning

LLD rule Cluster metrics discovery

Name Description Type Key and additional info
Cluster metrics discovery Dependent item envoy.lld.cluster

Preprocessing

  • Prometheus to JSON: envoy_cluster_membership_total

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cluster metrics discovery

Name Description Type Key and additional info
Cluster ["{#CLUSTER_NAME}"]: Membership, total

Current cluster membership total.

Dependent item envoy.cluster.membership_total["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Membership, healthy

Current cluster healthy total (inclusive of both health checking and outlier detection).

Dependent item envoy.cluster.membership_healthy["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy

Current cluster unhealthy.

Calculated envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]
Cluster ["{#CLUSTER_NAME}"]: Membership, degraded

Current cluster degraded total.

Dependent item envoy.cluster.membership_degraded["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Connections, total

Current cluster total connections.

Dependent item envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Connections, active

Current cluster total active connections.

Dependent item envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Requests total, rate

Current cluster request total per second.

Dependent item envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate

Current cluster requests that timed out waiting for a response per second.

Dependent item envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate

Total upstream requests completed per second.

Dependent item envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate

Aggregate HTTP response codes per second.

Dependent item envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Requests pending

Total active requests pending a connection pool connection.

Dependent item envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Requests active

Total active requests.

Dependent item envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate

Total sent connection bytes per second.

Dependent item envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate

Total received connection bytes per second.

Dependent item envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second

Trigger prototypes for Cluster metrics discovery

Name Description Expression Severity Dependencies and additional info
Envoy Proxy: There are unhealthy clusters last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 Average

LLD rule Listeners metrics discovery

Name Description Type Key and additional info
Listeners metrics discovery Dependent item envoy.lld.listeners

Preprocessing

  • Prometheus to JSON: envoy_listener_downstream_cx_active

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Listeners metrics discovery

Name Description Type Key and additional info
Listener ["{#LISTENER_ADDRESS}"]: Connections, active

Total active connections.

Dependent item envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

Listener ["{#LISTENER_ADDRESS}"]: Connections, rate

Total connections per second.

Dependent item envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing

Sockets currently undergoing listener filter processing.

Dependent item envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

LLD rule HTTP metrics discovery

Name Description Type Key and additional info
HTTP metrics discovery Dependent item envoy.lld.http

Preprocessing

  • Prometheus to JSON: envoy_http_downstream_rq_total

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for HTTP metrics discovery

Name Description Type Key and additional info
HTTP ["{#CONN_MANAGER}"]: Requests, rate

Total active connections per second.

Dependent item envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
HTTP ["{#CONN_MANAGER}"]: Requests, active

Total active requests.

Dependent item envoy.http.downstream_rq_active["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate

Total requests closed due to a timeout on the request path per second.

Dependent item envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
HTTP ["{#CONN_MANAGER}"]: Connections, rate

Total connections per second.

Dependent item envoy.http.downstream_cx_total["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
HTTP ["{#CONN_MANAGER}"]: Connections, active

Total active connections.

Dependent item envoy.http.downstream_cx_active["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

HTTP ["{#CONN_MANAGER}"]: Bytes in, rate

Total bytes received per second.

Dependent item envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second
HTTP ["{#CONN_MANAGER}"]: Bytes out, rate

Total bytes sent per second.

Dependent item envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"]

Preprocessing

  • Prometheus pattern: The text is too long. Please see the template.

  • Change per second

Feedback

Please report any issues with the template at

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

?l¨¢nky a dokumentace

+ Navrhn¨§te nov? ?l¨¢nek

Nena?li jste integraci, kterou pot?ebujete?