Tawon Prometheus Metrics
Tawon exports Prometheus metrics through the /metrics
endpoint. The metrics are categorized into general and
task-specific metrics to provide comprehensive insights into directive operations and task performance.
Tawon Metric Categories
-
General Metrics for Directives These metrics provide an overview of the directive processing status across all tasks.
-
Task-Specific Metrics Each task within Tawon has its own set of specific metrics, allowing users to monitor task-level details for more granular insights.
BPF Metrics
BPF metrics enhance the observability of eBPF operations. They support exposing CPU usage and execution counts for all of Tawon’s BPF programs. This functionality depends on the kernel configuration:
-
Kernel Requirement: Ensure the
bpf_stats_enabled
sysctl is enabled. -
Kernel Versions: Non-RHEL based distributions: Kernel version 5.1 or higher
To enable, set the bpf_stats_enabled
sysctl to 1
:
sysctl -w kernel.bpf_stats_enabled=1
Scraping Metrics
Overview
A common and standardized approach for setting up metric scraping from Pods and Services in Kubernetes is by leveraging PodMonitor
and ServiceMonitor
Custom Resource Definitions (CRDs). These resources allow administrators to specify which endpoints and metrics Prometheus should scrape, using label selectors to target specific Pods and Services.
When correctly configured, the Prometheus Operator automatically discovers the PodMonitor
and ServiceMonitor
resources, scrapes the defined endpoints, and exposes the collected metrics for monitoring and alerting.
Prerequisites
Before configuring PodMonitor
or ServiceMonitor
, ensure the following:
-
Prometheus Operator is Installed: You can deploy the Prometheus Operator using the kube-prometheus stack, Helm charts, or custom manifests. Verify the Prometheus Operator installation:
kubectl get pods -n monitoring kubectl get crds | grep prometheus
bashThe required CRDs, including `PodMonitor` and `ServiceMonitor`, must be present.
-
Application Exposes Metrics: Your application should expose metrics in Prometheus format, typically via an HTTP endpoint such as
/metrics
. -
Service or Pod Labels: Ensure that Pods and Services have proper labels, which will be used for filtering in the
PodMonitor
orServiceMonitor
.
Configuring a PodMonitor
A PodMonitor
is used to scrape metrics from specific Pods. The example below demonstrates how to set up a PodMonitor
for Tawon Pods:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: tawon-podmonitor
namespace: tawon-operator
spec:
namespaceSelector:
matchNames:
- tawon-operator # Target the "tawon-operator" namespace
podMetricsEndpoints:
- interval: 10s # Scrape metrics every 10 seconds
path: /metrics # Path where metrics are exposed
port: http-metrics # Port where metrics endpoint is served
selector:
matchLabels:
app.kubernetes.io/name: tawon-directive
Explanation
-
namespaceSelector
: Specifies which namespaces the Prometheus Operator should scan for matching Pods. -
podMetricsEndpoints
: Defines the scraping configuration:-
interval
: How frequently metrics are scraped. -
path
: The endpoint where metrics are exposed (e.g.,/metrics
). -
port
: Name of the container port where the metrics are served.
-
-
selector
: Filters Pods based on labels. In this case, Pods with labelsapp.kubernetes.io/name: tawon-directive
andapp.kubernetes.io/name: tawon-agent
are selected.
Configuring Prometheus to Use PodMonitor
and ServiceMonitor
The Prometheus Operator automatically discovers PodMonitor
and ServiceMonitor
resources. To ensure Prometheus is configured correctly:
-
Verify the Prometheus Custom Resource configuration:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus-example
namespace: monitoring
spec:
serviceMonitorSelector: {} # Selects all ServiceMonitors
podMonitorSelector: {} # Selects all PodMonitors
serviceMonitorNamespaceSelector: {}
podMonitorNamespaceSelector: {}
replicas: 1
-
serviceMonitorSelector
andpodMonitorSelector
: Ensure these are set to{}
to scrape allServiceMonitor
andPodMonitor
resources. If filtering is needed, specify label-based selectors. -
serviceMonitorNamespaceSelector
andpodMonitorNamespaceSelector
: Controls namespace selection for the monitors.-
Apply the Prometheus resource and monitors:
-
kubectl apply -f prometheus.yaml
kubectl apply -f podmonitor.yaml
-
Verify Prometheus is scraping the metrics: Access the Prometheus UI (e.g., via a
kubectl port-forward
):kubectl port-forward svc/prometheus-example -n monitoring 9090:9090
bashVisit `http://localhost:9090` and search for the target metrics.
Summary
By setting up PodMonitor
CRD, the Prometheus Operator simplifies the discovery and scraping of application metrics. This declarative approach ensures that Prometheus dynamically monitors Pods and Services based on labels and specified endpoints, providing a scalable solution for Kubernetes monitoring.
Tawon Metrics
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
directives_ran_total |
Directives execution counter |
tawonID, status |
Summary |
directive_duration_seconds |
Duration of an individual task |
tawonID |
Counter |
tasks_ran_total |
Tasks execution counter |
task, status, tawonID |
Summary |
task_duration_seconds |
Duration of an individual task |
tawonID, task |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
payload_task_total |
Payloads captured by payload task |
status |
Gauge |
payload_processes_watched |
Processes tracked in payload tasks |
- |
Counter |
payload_msgs_lost |
Payload messages lost on the perf event ring buffer |
- |
Counter |
payload_out_of_order_msgs |
Out-of-order payload messages |
- |
Counter |
payload_msgs |
Payload messages processed |
- |
Gauge |
payload_tids_watched |
Process Thread IDs watched by payload |
- |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
resetconn_msgs_lost |
Payload messages lost on the perf event ring buffer |
- |
Counter |
resetconn_msgs |
resetconn messages processed |
- |
Gauge |
resetconn_processes_watched |
Processes captured by resetconn |
- |
Gauge |
resetconn_tids_watched |
Process Thread IDs watched by resetconn |
- |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
exec_msgs_lost |
Exec messages lost on the perf event ring buffer |
- |
Counter |
exec_proc_end_before_start |
Process ended messages received before the process started message |
- |
Counter |
exec_msgs |
Exec messages processed |
- |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
tlsplaintext_msgs_lost |
tlsplaintext messages lost on the perf event ring buffer |
- |
Counter |
tlsplaintext_out_of_order_msgs |
Out-of-order tlsplaintext messages |
- |
Counter |
tlsplaintext_msgs |
tlsplaintext messages processed |
- |
Gauge |
tlsplaintext_processes_watched |
Processes captured by tlsplaintext |
- |
Gauge |
tlsplaintext_tids_watched |
Process Thread IDs watched by tlsplaintext |
- |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
flows_task_total |
Flows captured |
status |
Counter |
flow_msgs |
Flow messages processed |
- |
Counter |
flow_msgs_lost |
Flow messages lost on the perf event ring buffer |
- |
Counter |
flow_out_of_order_msgs |
Out-of-order flow messages |
- |
Counter |
flow_missing_ended |
Flows missing an ending time |
- |
Counter |
flow_missing_started |
Flows missing starting time |
- |
Counter |
flow_extra_ended |
Flows with a spurious ending time |
- |
Counter |
flow_missing_tid |
Flows missing a thread ID |
- |
Counter |
flow_missing_socket |
Flows with missing socket info |
- |
Gauge |
flow_flows_tracked |
Flows tracked |
- |
Gauge |
flow_threads_tracked |
Threads with flows tracked |
- |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
tlsheaders_parse_total |
tlsheaders payload parsings attempted |
status |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
pfcp_parse_total |
PFCP payload parsings attempted |
status |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
ecpri_parse_total |
ECPRI payload parsings attempted |
status |
Type | Metrics | Description | Labels |
---|---|---|---|
Gauge |
bpf_probes_attached |
BPF probes attached |
module, program |
Counter |
bpf_probes_attach_failure |
BPF probes attachement failures |
module, program |
Counter |
bpf_probes_detach_failure |
BPF probes detachement failures |
module, program |
Histogram |
bpf_cpu_usage_histogram |
Percentage of cpu usage for bpf programs |
module, program |
Gauge |
bpf_cpu_usage |
Current percentage of cpu usage for bpf programs |
module, program |
Counter |
bpf_run_counter |
Execution counter for bpf programs |
module, program |
Type | Metrics | Description | Labels |
---|---|---|---|
Counter |
h2_chunks |
H2 chunks processed |
- |
Histogram |
h2_chunk_size |
Size of incoming chunks in bytes. Buckets: 10, 50, 100, 250, 750, 1500 |
- |
Counter |
h2_chunks_missing_flowid |
H2 chunks that are missing a flow ID |
- |
Counter |
h2_out_of_order |
Out-of-order chunks detected |
- |
Counter |
h2_invalid_frame_type |
Invalid frame type |
- |
Counter |
h2_invalid_empty_data_frame |
Invalid empty data frame, missing end stream flag |
- |
Counter |
h2_invalid_empty_payload |
Invalid empty payload frame |
type |
Counter |
h2_payload_overflow_buffer |
Frame payload is larger than the input buffer - could be invalid frame |
type |
Counter |
h2_chunk_buffer_overflow |
Accumulated chunks are larger than the input buffer - could be invalid frame |
type |
Counter |
h2_invalid_frame_flags |
Flags on frame are invalid for its type |
type |
Counter |
h2_incoming_frames |
Assembled frames coming into the system |
type, dismissed |
Counter |
h2_zero_stream_frames |
Unexpected stream 0 frame |
type |
Counter |
h2_server_stream_frames |
Server initiated stream frame |
type |
Counter |
h2_failed_dir_detections |
Failed to detect flow direction within buffer allocation |
- |
Counter |
h2_connections |
Bi-rectional connections detected |
direction |
Histogram |
h2_conn_dir_buffer |
Size of buffer on connection before directions detected. Buckets: 1, 5, 10, 50, 250, 500 |
direction |
Counter |
h2_invalid_go_away_frames |
Invalid GO_AWAY frames - too short |
- |
Counter |
h2_invalid_rst_stream_frames |
Invalid RST_STREAM frames - too short |
- |
Counter |
h2_header_decoder_resets |
Times header decoders were reset |
- |
Counter |
h2_expired_streams |
Partial streams that have expired |
- |
Counter |
h2_terminated_streams |
Partial streams that were terminated |
reason |
Counter |
h2_missing_headers |
Header or partial header could not be parsed |
direction, reason |
Counter |
h2_empty_headers |
Header is empty |
direction |
Counter |
h2_wrong_pseudo_headers |
At least 1 wrong pseudo header for direction detected |
direction |
Counter |
h2_multi_pseudo_headers |
More than 1 of the same pseudo header detected |
direction |
Counter |
h2_duplicate_component |
Duplicate component (request or response header or body) detected |
component |
Counter |
h2_round_trips_emitted |
Number of full or partial round trips emitted |
partial, reason |
Counter |
h2_content_type_parse_failure |
Http content-type parse failure |
- |
Counter |
h2_multipart_failure |
Failed to parse multipart body |
- |
Counter |
h2_multipart_bodies |
Number of bodies detected as multipart |
- |
Counter |
h2_content_types |
Http content-types - including multiparts |
type |
Gauge |
h2_framer_flows |
Flows tracked by framer |
- |
Gauge |
h2_flower_flows |
Flows tracked by flower |
- |
Gauge |
h2_streams |
Streams tracked |
- |
Tawon Operator
Tawon operator is based on operator-sdk framework to expose metrics for monitoring and alerting. The metrics are exposed via the /metrics
endpoint.
The following sections walks you through accessing the metrics exposed by the Tawon Operator.
Step 1: Create a ClusterRoleBinding
To allow access to the /metrics
endpoint, create a ClusterRoleBinding
that binds the appropriate ClusterRole
to the Tawon Operator’s ServiceAccount.
kubectl create clusterrolebinding tawon-operator-metrics-binding \
--clusterrole=tawon-operator-metrics-reader \
--serviceaccount=openshift-operators:tawon-operator-controller-manager
Step 2: Generate a Token
Generate a token for the tawon-operator-controller-manager
ServiceAccount. The token will be used to authenticate the request to the /metrics
endpoint.
export TOKEN=$(kubectl create token tawon-operator-controller-manager -n openshift-operators)
Step 3: Expose the Metrics Port Locally
Expose the metrics service port on localhost using oc port-forward
. This command creates a local proxy to the metrics service.
oc port-forward svc/tawon-operator-controller-manager-metrics-service -n openshift-operators 8443
Step 4: Retrieve Metrics
Use the generated token to authenticate and fetch the metrics via a curl
request.
curl -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics -k
The -k
flag is used to bypass certificate verification for HTTPS.
Additional Notes
ClusterRole Deployed Within the Operator Bundle
The tawon-operator-metrics-reader
ClusterRole is deployed as part of the operator bundle. Below is an example of its definition:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: "2024-10-22T22:25:14Z"
labels:
app.kubernetes.io/component: kube-rbac-proxy
app.kubernetes.io/created-by: tawon-operator
app.kubernetes.io/instance: metrics-reader
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: clusterrole
app.kubernetes.io/part-of: tawon-operator
olm.owner: tawon-operator.v2.39.20
name: tawon-operator-metrics-reader
resourceVersion: "59601497"
uid: 1322e10b-f728-48a3-949c-470d633b0606
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
This ClusterRole
grants get
access to the /metrics
endpoint, enabling Prometheus or other monitoring tools to scrape metrics exposed by the Tawon Operator.
Tawon Operator Metrics
For more information, see: https://book.kubebuilder.io/reference/metrics-reference