Tawon Prometheus Metrics

Tawon exports Prometheus metrics through the /metrics endpoint. The metrics are categorized into general and task-specific metrics to provide comprehensive insights into directive operations and task performance.

Tawon Metric Categories

  1. General Metrics for Directives These metrics provide an overview of the directive processing status across all tasks.

  2. Task-Specific Metrics Each task within Tawon has its own set of specific metrics, allowing users to monitor task-level details for more granular insights.

BPF Metrics

BPF metrics enhance the observability of eBPF operations. They support exposing CPU usage and execution counts for all of Tawon’s BPF programs. This functionality depends on the kernel configuration:

  • Kernel Requirement: Ensure the bpf_stats_enabled sysctl is enabled.

  • Kernel Versions: Non-RHEL based distributions: Kernel version 5.1 or higher

To enable, set the bpf_stats_enabled sysctl to 1:

sysctl -w kernel.bpf_stats_enabled=1
sh

Scraping Metrics

Overview

A common and standardized approach for setting up metric scraping from Pods and Services in Kubernetes is by leveraging PodMonitor and ServiceMonitor Custom Resource Definitions (CRDs). These resources allow administrators to specify which endpoints and metrics Prometheus should scrape, using label selectors to target specific Pods and Services.

When correctly configured, the Prometheus Operator automatically discovers the PodMonitor and ServiceMonitor resources, scrapes the defined endpoints, and exposes the collected metrics for monitoring and alerting.


Prerequisites

Before configuring PodMonitor or ServiceMonitor, ensure the following:

  1. Prometheus Operator is Installed: You can deploy the Prometheus Operator using the kube-prometheus stack, Helm charts, or custom manifests. Verify the Prometheus Operator installation:

    kubectl get pods -n monitoring
    kubectl get crds | grep prometheus
    bash
    The required CRDs, including `PodMonitor` and `ServiceMonitor`, must be present.
  2. Application Exposes Metrics: Your application should expose metrics in Prometheus format, typically via an HTTP endpoint such as /metrics.

  3. Service or Pod Labels: Ensure that Pods and Services have proper labels, which will be used for filtering in the PodMonitor or ServiceMonitor.


Configuring a PodMonitor

A PodMonitor is used to scrape metrics from specific Pods. The example below demonstrates how to set up a PodMonitor for Tawon Pods:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: tawon-podmonitor
  namespace: tawon-operator
spec:
  namespaceSelector:
    matchNames:
      - tawon-operator        # Target the "tawon-operator" namespace
  podMetricsEndpoints:
    - interval: 10s           # Scrape metrics every 10 seconds
      path: /metrics          # Path where metrics are exposed
      port: http-metrics      # Port where metrics endpoint is served
  selector:
    matchLabels:
      app.kubernetes.io/name: tawon-directive
yaml

Explanation

  • namespaceSelector: Specifies which namespaces the Prometheus Operator should scan for matching Pods.

  • podMetricsEndpoints: Defines the scraping configuration:

    • interval: How frequently metrics are scraped.

    • path: The endpoint where metrics are exposed (e.g., /metrics).

    • port: Name of the container port where the metrics are served.

  • selector: Filters Pods based on labels. In this case, Pods with labels app.kubernetes.io/name: tawon-directive and app.kubernetes.io/name: tawon-agent are selected.


Configuring Prometheus to Use PodMonitor and ServiceMonitor

The Prometheus Operator automatically discovers PodMonitor and ServiceMonitor resources. To ensure Prometheus is configured correctly:

  1. Verify the Prometheus Custom Resource configuration:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-example
  namespace: monitoring
spec:
  serviceMonitorSelector: {}  # Selects all ServiceMonitors
  podMonitorSelector: {}      # Selects all PodMonitors
  serviceMonitorNamespaceSelector: {}
  podMonitorNamespaceSelector: {}
  replicas: 1
yaml
  • serviceMonitorSelector and podMonitorSelector: Ensure these are set to {} to scrape all ServiceMonitor and PodMonitor resources. If filtering is needed, specify label-based selectors.

  • serviceMonitorNamespaceSelector and podMonitorNamespaceSelector: Controls namespace selection for the monitors.

    1. Apply the Prometheus resource and monitors:

kubectl apply -f prometheus.yaml
kubectl apply -f podmonitor.yaml
bash
  1. Verify Prometheus is scraping the metrics: Access the Prometheus UI (e.g., via a kubectl port-forward):

    kubectl port-forward svc/prometheus-example -n monitoring 9090:9090
    bash
    Visit `http://localhost:9090` and search for the target metrics.

Summary

By setting up PodMonitor CRD, the Prometheus Operator simplifies the discovery and scraping of application metrics. This declarative approach ensures that Prometheus dynamically monitors Pods and Services based on labels and specified endpoints, providing a scalable solution for Kubernetes monitoring.

Tawon Metrics

Table 1. Directives and tasks
Type Metrics Description Labels

Counter

directives_ran_total

Directives execution counter

tawonID, status

Summary

directive_duration_seconds

Duration of an individual task

tawonID

Counter

tasks_ran_total

Tasks execution counter

task, status, tawonID

Summary

task_duration_seconds

Duration of an individual task

tawonID, task

Table 2. payload
Type Metrics Description Labels

Counter

payload_task_total

Payloads captured by payload task

status

Gauge

payload_processes_watched

Processes tracked in payload tasks

-

Counter

payload_msgs_lost

Payload messages lost on the perf event ring buffer

-

Counter

payload_out_of_order_msgs

Out-of-order payload messages

-

Counter

payload_msgs

Payload messages processed

-

Gauge

payload_tids_watched

Process Thread IDs watched by payload

-

Table 3. resetconn
Type Metrics Description Labels

Counter

resetconn_msgs_lost

Payload messages lost on the perf event ring buffer

-

Counter

resetconn_msgs

resetconn messages processed

-

Gauge

resetconn_processes_watched

Processes captured by resetconn

-

Gauge

resetconn_tids_watched

Process Thread IDs watched by resetconn

-

Table 4. exec
Type Metrics Description Labels

Counter

exec_msgs_lost

Exec messages lost on the perf event ring buffer

-

Counter

exec_proc_end_before_start

Process ended messages received before the process started message

-

Counter

exec_msgs

Exec messages processed

-

Table 5. tlsplaintext
Type Metrics Description Labels

Counter

tlsplaintext_msgs_lost

tlsplaintext messages lost on the perf event ring buffer

-

Counter

tlsplaintext_out_of_order_msgs

Out-of-order tlsplaintext messages

-

Counter

tlsplaintext_msgs

tlsplaintext messages processed

-

Gauge

tlsplaintext_processes_watched

Processes captured by tlsplaintext

-

Gauge

tlsplaintext_tids_watched

Process Thread IDs watched by tlsplaintext

-

Table 6. flows
Type Metrics Description Labels

Counter

flows_task_total

Flows captured

status

Counter

flow_msgs

Flow messages processed

-

Counter

flow_msgs_lost

Flow messages lost on the perf event ring buffer

-

Counter

flow_out_of_order_msgs

Out-of-order flow messages

-

Counter

flow_missing_ended

Flows missing an ending time

-

Counter

flow_missing_started

Flows missing starting time

-

Counter

flow_extra_ended

Flows with a spurious ending time

-

Counter

flow_missing_tid

Flows missing a thread ID

-

Counter

flow_missing_socket

Flows with missing socket info

-

Gauge

flow_flows_tracked

Flows tracked

-

Gauge

flow_threads_tracked

Threads with flows tracked

-

Table 7. tlsheader
Type Metrics Description Labels

Counter

tlsheaders_parse_total

tlsheaders payload parsings attempted

status

Table 8. pfcp
Type Metrics Description Labels

Counter

pfcp_parse_total

PFCP payload parsings attempted

status

Table 9. ecpri
Type Metrics Description Labels

Counter

ecpri_parse_total

ECPRI payload parsings attempted

status

Table 10. bpf
Type Metrics Description Labels

Gauge

bpf_probes_attached

BPF probes attached

module, program

Counter

bpf_probes_attach_failure

BPF probes attachement failures

module, program

Counter

bpf_probes_detach_failure

BPF probes detachement failures

module, program

Histogram

bpf_cpu_usage_histogram

Percentage of cpu usage for bpf programs

module, program

Gauge

bpf_cpu_usage

Current percentage of cpu usage for bpf programs

module, program

Counter

bpf_run_counter

Execution counter for bpf programs

module, program

Table 11. h2
Type Metrics Description Labels

Counter

h2_chunks

H2 chunks processed

-

Histogram

h2_chunk_size

Size of incoming chunks in bytes. Buckets: 10, 50, 100, 250, 750, 1500

-

Counter

h2_chunks_missing_flowid

H2 chunks that are missing a flow ID

-

Counter

h2_out_of_order

Out-of-order chunks detected

-

Counter

h2_invalid_frame_type

Invalid frame type

-

Counter

h2_invalid_empty_data_frame

Invalid empty data frame, missing end stream flag

-

Counter

h2_invalid_empty_payload

Invalid empty payload frame

type

Counter

h2_payload_overflow_buffer

Frame payload is larger than the input buffer - could be invalid frame

type

Counter

h2_chunk_buffer_overflow

Accumulated chunks are larger than the input buffer - could be invalid frame

type

Counter

h2_invalid_frame_flags

Flags on frame are invalid for its type

type

Counter

h2_incoming_frames

Assembled frames coming into the system

type, dismissed

Counter

h2_zero_stream_frames

Unexpected stream 0 frame

type

Counter

h2_server_stream_frames

Server initiated stream frame

type

Counter

h2_failed_dir_detections

Failed to detect flow direction within buffer allocation

-

Counter

h2_connections

Bi-rectional connections detected

direction

Histogram

h2_conn_dir_buffer

Size of buffer on connection before directions detected. Buckets: 1, 5, 10, 50, 250, 500

direction

Counter

h2_invalid_go_away_frames

Invalid GO_AWAY frames - too short

-

Counter

h2_invalid_rst_stream_frames

Invalid RST_STREAM frames - too short

-

Counter

h2_header_decoder_resets

Times header decoders were reset

-

Counter

h2_expired_streams

Partial streams that have expired

-

Counter

h2_terminated_streams

Partial streams that were terminated

reason

Counter

h2_missing_headers

Header or partial header could not be parsed

direction, reason

Counter

h2_empty_headers

Header is empty

direction

Counter

h2_wrong_pseudo_headers

At least 1 wrong pseudo header for direction detected

direction

Counter

h2_multi_pseudo_headers

More than 1 of the same pseudo header detected

direction

Counter

h2_duplicate_component

Duplicate component (request or response header or body) detected

component

Counter

h2_round_trips_emitted

Number of full or partial round trips emitted

partial, reason

Counter

h2_content_type_parse_failure

Http content-type parse failure

-

Counter

h2_multipart_failure

Failed to parse multipart body

-

Counter

h2_multipart_bodies

Number of bodies detected as multipart

-

Counter

h2_content_types

Http content-types - including multiparts

type

Gauge

h2_framer_flows

Flows tracked by framer

-

Gauge

h2_flower_flows

Flows tracked by flower

-

Gauge

h2_streams

Streams tracked

-

Tawon Operator

Tawon operator is based on operator-sdk framework to expose metrics for monitoring and alerting. The metrics are exposed via the /metrics endpoint. The following sections walks you through accessing the metrics exposed by the Tawon Operator.


Step 1: Create a ClusterRoleBinding

To allow access to the /metrics endpoint, create a ClusterRoleBinding that binds the appropriate ClusterRole to the Tawon Operator’s ServiceAccount.

kubectl create clusterrolebinding tawon-operator-metrics-binding \
  --clusterrole=tawon-operator-metrics-reader \
  --serviceaccount=openshift-operators:tawon-operator-controller-manager
bash

Step 2: Generate a Token

Generate a token for the tawon-operator-controller-manager ServiceAccount. The token will be used to authenticate the request to the /metrics endpoint.

export TOKEN=$(kubectl create token tawon-operator-controller-manager -n openshift-operators)
bash

Step 3: Expose the Metrics Port Locally

Expose the metrics service port on localhost using oc port-forward. This command creates a local proxy to the metrics service.

oc port-forward svc/tawon-operator-controller-manager-metrics-service -n openshift-operators 8443
bash

Step 4: Retrieve Metrics

Use the generated token to authenticate and fetch the metrics via a curl request.

curl -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics -k
bash

The -k flag is used to bypass certificate verification for HTTPS.


Additional Notes

ClusterRole Deployed Within the Operator Bundle

The tawon-operator-metrics-reader ClusterRole is deployed as part of the operator bundle. Below is an example of its definition:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2024-10-22T22:25:14Z"
  labels:
    app.kubernetes.io/component: kube-rbac-proxy
    app.kubernetes.io/created-by: tawon-operator
    app.kubernetes.io/instance: metrics-reader
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: clusterrole
    app.kubernetes.io/part-of: tawon-operator
    olm.owner: tawon-operator.v2.39.20
  name: tawon-operator-metrics-reader
  resourceVersion: "59601497"
  uid: 1322e10b-f728-48a3-949c-470d633b0606
rules:
- nonResourceURLs:
  - /metrics
  verbs:
  - get
yaml

This ClusterRole grants get access to the /metrics endpoint, enabling Prometheus or other monitoring tools to scrape metrics exposed by the Tawon Operator.


Summary

By following these steps, you can securely retrieve metrics from the Tawon Operator. The configuration ensures that only authorized ServiceAccounts have access to the /metrics endpoint, following Kubernetes RBAC best practices.

Tawon Operator Metrics