Tawon Prometheus Metrics

Tawon exports Prometheus metrics through the /metrics endpoint. The metrics are categorized into general and task-specific metrics to provide comprehensive insights into directive operations and task performance.

Tawon Metric Categories

General Metrics for Directives These metrics provide an overview of the directive processing status across all tasks.
Task-Specific Metrics Each task within Tawon has its own set of specific metrics, allowing users to monitor task-level details for more granular insights.

BPF Metrics

BPF metrics enhance the observability of eBPF operations. They support exposing CPU usage and execution counts for all of Tawon’s BPF programs. This functionality depends on the kernel configuration:

Kernel Requirement: Ensure the bpf_stats_enabled sysctl is enabled.
Kernel Versions: Non-RHEL based distributions: Kernel version 5.1 or higher

To enable, set the bpf_stats_enabled sysctl to 1:

sysctl -w kernel.bpf_stats_enabled=1

Scraping Metrics

Overview

A common and standardized approach for setting up metric scraping from Pods and Services in Kubernetes is by leveraging PodMonitor and ServiceMonitor Custom Resource Definitions (CRDs). These resources allow administrators to specify which endpoints and metrics Prometheus should scrape, using label selectors to target specific Pods and Services.

When correctly configured, the Prometheus Operator automatically discovers the PodMonitor and ServiceMonitor resources, scrapes the defined endpoints, and exposes the collected metrics for monitoring and alerting.

Prerequisites

Before configuring PodMonitor or ServiceMonitor, ensure the following:

Prometheus Operator is Installed: You can deploy the Prometheus Operator using the kube-prometheus stack, Helm charts, or custom manifests. Verify the Prometheus Operator installation:
```
kubectl get pods -n monitoring
kubectl get crds | grep prometheus
```
```
The required CRDs, including `PodMonitor` and `ServiceMonitor`, must be present.
```
Application Exposes Metrics: Your application should expose metrics in Prometheus format, typically via an HTTP endpoint such as /metrics.
Service or Pod Labels: Ensure that Pods and Services have proper labels, which will be used for filtering in the PodMonitor or ServiceMonitor.

Configuring a `PodMonitor`

A PodMonitor is used to scrape metrics from specific Pods. The example below demonstrates how to set up a PodMonitor for Tawon Pods:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: tawon-podmonitor
  namespace: tawon-operator
spec:
  namespaceSelector:
    matchNames:
      - tawon-operator        # Target the "tawon-operator" namespace
  podMetricsEndpoints:
    - interval: 10s           # Scrape metrics every 10 seconds
      path: /metrics          # Path where metrics are exposed
      port: http-metrics      # Port where metrics endpoint is served
  selector:
    matchLabels:
      app.kubernetes.io/name: tawon-directive

Explanation

namespaceSelector: Specifies which namespaces the Prometheus Operator should scan for matching Pods.
podMetricsEndpoints: Defines the scraping configuration:
- interval: How frequently metrics are scraped.
- path: The endpoint where metrics are exposed (e.g., /metrics).
- port: Name of the container port where the metrics are served.
selector: Filters Pods based on labels. In this case, Pods with labels app.kubernetes.io/name: tawon-directive and app.kubernetes.io/name: tawon-agent are selected.

Configuring Prometheus to Use `PodMonitor` and `ServiceMonitor`

The Prometheus Operator automatically discovers PodMonitor and ServiceMonitor resources. To ensure Prometheus is configured correctly:

Verify the Prometheus Custom Resource configuration:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus-example
  namespace: monitoring
spec:
  serviceMonitorSelector: {}  # Selects all ServiceMonitors
  podMonitorSelector: {}      # Selects all PodMonitors
  serviceMonitorNamespaceSelector: {}
  podMonitorNamespaceSelector: {}
  replicas: 1

serviceMonitorSelector and podMonitorSelector: Ensure these are set to {} to scrape all ServiceMonitor and PodMonitor resources. If filtering is needed, specify label-based selectors.
serviceMonitorNamespaceSelector and podMonitorNamespaceSelector: Controls namespace selection for the monitors.
1. Apply the Prometheus resource and monitors:

kubectl apply -f prometheus.yaml
kubectl apply -f podmonitor.yaml

Verify Prometheus is scraping the metrics: Access the Prometheus UI (e.g., via a kubectl port-forward):

kubectl port-forward svc/prometheus-example -n monitoring 9090:9090

Visit `http://localhost:9090` and search for the target metrics.

Summary

By setting up PodMonitor CRD, the Prometheus Operator simplifies the discovery and scraping of application metrics. This declarative approach ensures that Prometheus dynamically monitors Pods and Services based on labels and specified endpoints, providing a scalable solution for Kubernetes monitoring.

Tawon Metrics

Table 1. Directives and tasks
Type	Metrics	Description	Labels
Counter	directives_ran_total	Directives execution counter	tawonID, status
Summary	directive_duration_seconds	Duration of an individual task	tawonID
Counter	tasks_ran_total	Tasks execution counter	task, status, tawonID
Summary	task_duration_seconds	Duration of an individual task	tawonID, task

Table 2. payload
Type	Metrics	Description	Labels
Counter	payload_task_total	Payloads captured by payload task	status
Gauge	payload_processes_watched	Processes tracked in payload tasks	-
Counter	payload_msgs_lost	Payload messages lost on the perf event ring buffer	-
Counter	payload_out_of_order_msgs	Out-of-order payload messages	-
Counter	payload_msgs	Payload messages processed	-
Gauge	payload_tids_watched	Process Thread IDs watched by payload	-

Table 3. resetconn
Type	Metrics	Description	Labels
Counter	resetconn_msgs_lost	Payload messages lost on the perf event ring buffer	-
Counter	resetconn_msgs	resetconn messages processed	-
Gauge	resetconn_processes_watched	Processes captured by resetconn	-
Gauge	resetconn_tids_watched	Process Thread IDs watched by resetconn	-

Table 4. exec
Type	Metrics	Description	Labels
Counter	exec_msgs_lost	Exec messages lost on the perf event ring buffer	-
Counter	exec_proc_end_before_start	Process ended messages received before the process started message	-
Counter	exec_msgs	Exec messages processed	-

Table 5. tlsplaintext
Type	Metrics	Description	Labels
Counter	tlsplaintext_msgs_lost	tlsplaintext messages lost on the perf event ring buffer	-
Counter	tlsplaintext_out_of_order_msgs	Out-of-order tlsplaintext messages	-
Counter	tlsplaintext_msgs	tlsplaintext messages processed	-
Gauge	tlsplaintext_processes_watched	Processes captured by tlsplaintext	-
Gauge	tlsplaintext_tids_watched	Process Thread IDs watched by tlsplaintext	-

Table 6. flows
Type	Metrics	Description	Labels
Counter	flows_task_total	Flows captured	status
Counter	flow_msgs	Flow messages processed	-
Counter	flow_msgs_lost	Flow messages lost on the perf event ring buffer	-
Counter	flow_out_of_order_msgs	Out-of-order flow messages	-
Counter	flow_missing_ended	Flows missing an ending time	-
Counter	flow_missing_started	Flows missing starting time	-
Counter	flow_extra_ended	Flows with a spurious ending time	-
Counter	flow_missing_tid	Flows missing a thread ID	-
Counter	flow_missing_socket	Flows with missing socket info	-
Gauge	flow_flows_tracked	Flows tracked	-
Gauge	flow_threads_tracked	Threads with flows tracked	-

Table 7. tlsheader
Type	Metrics	Description	Labels
Counter	tlsheaders_parse_total	tlsheaders payload parsings attempted	status

Table 8. pfcp
Type	Metrics	Description	Labels
Counter	pfcp_parse_total	PFCP payload parsings attempted	status

Table 9. ecpri
Type	Metrics	Description	Labels
Counter	ecpri_parse_total	ECPRI payload parsings attempted	status

Table 10. bpf
Type	Metrics	Description	Labels
Gauge	bpf_probes_attached	BPF probes attached	module, program
Counter	bpf_probes_attach_failure	BPF probes attachement failures	module, program
Counter	bpf_probes_detach_failure	BPF probes detachement failures	module, program
Histogram	bpf_cpu_usage_histogram	Percentage of cpu usage for bpf programs	module, program
Gauge	bpf_cpu_usage	Current percentage of cpu usage for bpf programs	module, program
Counter	bpf_run_counter	Execution counter for bpf programs	module, program

Table 11. h2
Type	Metrics	Description	Labels
Counter	h2_chunks	H2 chunks processed	-
Histogram	h2_chunk_size	Size of incoming chunks in bytes. Buckets: 10, 50, 100, 250, 750, 1500	-
Counter	h2_chunks_missing_flowid	H2 chunks that are missing a flow ID	-
Counter	h2_out_of_order	Out-of-order chunks detected	-
Counter	h2_invalid_frame_type	Invalid frame type	-
Counter	h2_invalid_empty_data_frame	Invalid empty data frame, missing end stream flag	-
Counter	h2_invalid_empty_payload	Invalid empty payload frame	type
Counter	h2_payload_overflow_buffer	Frame payload is larger than the input buffer - could be invalid frame	type
Counter	h2_chunk_buffer_overflow	Accumulated chunks are larger than the input buffer - could be invalid frame	type
Counter	h2_invalid_frame_flags	Flags on frame are invalid for its type	type
Counter	h2_incoming_frames	Assembled frames coming into the system	type, dismissed
Counter	h2_zero_stream_frames	Unexpected stream 0 frame	type
Counter	h2_server_stream_frames	Server initiated stream frame	type
Counter	h2_failed_dir_detections	Failed to detect flow direction within buffer allocation	-
Counter	h2_connections	Bi-rectional connections detected	direction
Histogram	h2_conn_dir_buffer	Size of buffer on connection before directions detected. Buckets: 1, 5, 10, 50, 250, 500	direction
Counter	h2_invalid_go_away_frames	Invalid GO_AWAY frames - too short	-
Counter	h2_invalid_rst_stream_frames	Invalid RST_STREAM frames - too short	-
Counter	h2_header_decoder_resets	Times header decoders were reset	-
Counter	h2_expired_streams	Partial streams that have expired	-
Counter	h2_terminated_streams	Partial streams that were terminated	reason
Counter	h2_missing_headers	Header or partial header could not be parsed	direction, reason
Counter	h2_empty_headers	Header is empty	direction
Counter	h2_wrong_pseudo_headers	At least 1 wrong pseudo header for direction detected	direction
Counter	h2_multi_pseudo_headers	More than 1 of the same pseudo header detected	direction
Counter	h2_duplicate_component	Duplicate component (request or response header or body) detected	component
Counter	h2_round_trips_emitted	Number of full or partial round trips emitted	partial, reason
Counter	h2_content_type_parse_failure	Http content-type parse failure	-
Counter	h2_multipart_failure	Failed to parse multipart body	-
Counter	h2_multipart_bodies	Number of bodies detected as multipart	-
Counter	h2_content_types	Http content-types - including multiparts	type
Gauge	h2_framer_flows	Flows tracked by framer	-
Gauge	h2_flower_flows	Flows tracked by flower	-
Gauge	h2_streams	Streams tracked	-

Tawon Operator

Tawon operator is based on operator-sdk framework to expose metrics for monitoring and alerting. The metrics are exposed via the /metrics endpoint. The following sections walks you through accessing the metrics exposed by the Tawon Operator.

Step 1: Create a ClusterRoleBinding

To allow access to the /metrics endpoint, create a ClusterRoleBinding that binds the appropriate ClusterRole to the Tawon Operator’s ServiceAccount.

kubectl create clusterrolebinding tawon-operator-metrics-binding \
  --clusterrole=tawon-operator-metrics-reader \
  --serviceaccount=openshift-operators:tawon-operator-controller-manager

Step 2: Generate a Token

Generate a token for the tawon-operator-controller-manager ServiceAccount. The token will be used to authenticate the request to the /metrics endpoint.

export TOKEN=$(kubectl create token tawon-operator-controller-manager -n openshift-operators)

Step 3: Expose the Metrics Port Locally

Expose the metrics service port on localhost using oc port-forward. This command creates a local proxy to the metrics service.

oc port-forward svc/tawon-operator-controller-manager-metrics-service -n openshift-operators 8443

Step 4: Retrieve Metrics

Use the generated token to authenticate and fetch the metrics via a curl request.

curl -H "Authorization: Bearer $TOKEN" https://localhost:8443/metrics -k

The -k flag is used to bypass certificate verification for HTTPS.

Additional Notes

ClusterRole Deployed Within the Operator Bundle

The tawon-operator-metrics-reader ClusterRole is deployed as part of the operator bundle. Below is an example of its definition:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2024-10-22T22:25:14Z"
  labels:
    app.kubernetes.io/component: kube-rbac-proxy
    app.kubernetes.io/created-by: tawon-operator
    app.kubernetes.io/instance: metrics-reader
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: clusterrole
    app.kubernetes.io/part-of: tawon-operator
    olm.owner: tawon-operator.v2.39.20
  name: tawon-operator-metrics-reader
  resourceVersion: "59601497"
  uid: 1322e10b-f728-48a3-949c-470d633b0606
rules:
- nonResourceURLs:
  - /metrics
  verbs:
  - get

This ClusterRole grants get access to the /metrics endpoint, enabling Prometheus or other monitoring tools to scrape metrics exposed by the Tawon Operator.

Summary

By following these steps, you can securely retrieve metrics from the Tawon Operator. The configuration ensures that only authorized ServiceAccounts have access to the /metrics endpoint, following Kubernetes RBAC best practices.

Tawon Operator Metrics

For more information, see: https://book.kubebuilder.io/reference/metrics-reference

Tawon Prometheus Metrics

Tawon Metric Categories

BPF Metrics

Scraping Metrics

Overview

Prerequisites

Configuring a PodMonitor

Explanation

Configuring Prometheus to Use PodMonitor and ServiceMonitor

Summary

Tawon Metrics

Tawon Operator

Step 1: Create a ClusterRoleBinding

Step 2: Generate a Token

Step 3: Expose the Metrics Port Locally

Step 4: Retrieve Metrics

Additional Notes

ClusterRole Deployed Within the Operator Bundle

Summary

Tawon Operator Metrics

Configuring a `PodMonitor`

Configuring Prometheus to Use `PodMonitor` and `ServiceMonitor`