OpenTelemetry eBPF Instrumentation: Zero-Code Observability Under 2% Overhead (Production Guide 2025)

November 17, 2025 · 19 min read

Platform Engineering Contributor

48.5% of organizations are already using OpenTelemetry. Another 25.3% want to implement it but are stuck—blocked by the biggest adoption barrier: instrumenting existing applications requires code changes, rebuilds, and coordination across every team. In November 2025, OpenTelemetry released an answer: eBPF Instrumentation (OBI), which instruments every application in your cluster—Go, Java, Python, Node.js, Ruby—without touching a single line of code. Here's how to deploy it in production, what it can and can't do, and when you still need SDK instrumentation.

🎙️ Listen to the podcast episode: OpenTelemetry eBPF Instrumentation: Zero-Code Observability Under 2% Overhead - Jordan and Alex investigate how eBPF delivers complete observability without code changes and the TLS encryption catch nobody talks about.

Quick Answer (TL;DR)

Problem: OpenTelemetry adoption stalls because instrumenting applications requires code changes across multiple languages and teams.

Solution: OpenTelemetry eBPF Instrumentation (OBI) operates at the Linux kernel level, auto-instrumenting all applications without code changes, restarts, or SDK dependencies.

Key Statistics:

Less than 2% CPU overhead typical, compared to 10-50% for traditional APM agents
10 protocols supported automatically: HTTP, gRPC, SQL, Redis, Kafka, MongoDB, GraphQL, Elasticsearch, AWS S3
100,000+ monthly Docker pulls for Grafana Beyla before OpenTelemetry donation
Linux kernel 4.4+ required, kernel 5.x recommended for full feature support
First alpha release: November 2025

Best For: Broad baseline observability across all services, legacy applications you can't modify, polyglot environments with 5+ languages.

When NOT to Use OBI Alone: You need business-specific metrics like user IDs or transaction types, custom trace attributes are required, you're running non-Linux environments, or you need real-time security blocking.

Key Statistics (2025 Data)

Metric	Value	Source
OpenTelemetry adoption	48.5% using, 25.3% planning	EMA Survey 2025
ROI from OpenTelemetry	46.4% report >20% ROI	EMA Survey 2025
MTTR improvement	95% report 10%+ improvement	EMA Survey 2025
eBPF CPU overhead	Under 2% typical, under 5% maximum	Pixie benchmarks
Beyla Docker pulls	100,000+ monthly	Grafana Labs
Community contributors	10:1 ratio vs Grafana employees	Grafana Labs
eBPF compatibility issues	83% of programs affected	ACM Research 2025
Unused observability data	70%	Grafana Survey 2025
Developer productivity gain	60%+	EMA Survey
Test performance after donation	10x faster	OpenTelemetry Blog

The Observability Instrumentation Gap

The numbers tell the story: nearly half of organizations use OpenTelemetry, and a quarter more want to implement it. But wanting and doing are different things. The primary blocker, according to Grafana's 2025 Observability Survey, is "adoption time and effort."

Here's what that actually means. Picture a typical microservices architecture: 50 services across 5 languages—Go, Java, Python, Node.js, and .NET. Each language needs its own SDK integration, context propagation setup, and exporter configuration. That's 10+ teams coordinating deployment schedules, code reviews, and testing cycles. Timeline? Three to six months, minimum. And that's if nothing goes wrong.

The result is predictable: partial observability. Teams instrument their most critical services first, maybe 40% coverage, and the rest becomes permanent technical debt. You have traces for your API gateway but not the internal services it calls. You see latency for your order service but can't trace why the payment service is slow.

Why Traditional Auto-Instrumentation Falls Short

Language-specific agents were supposed to solve this. Java agents, .NET profilers, Python monkey-patching—these approaches promised automatic instrumentation without code changes.

They delivered, partially. You still need deployment changes. You still need to restart applications. You still need configuration for each agent. And the overhead? Traditional APM agents add 10-50% CPU overhead depending on configuration, according to industry benchmarks. That's fine for some workloads, unacceptable for others.

Worse, compiled languages like Go had no solution at all. You couldn't monkey-patch a binary. You couldn't inject bytecode. Go developers had to manually instrument everything—until eBPF.

The Kernel-Level Breakthrough

eBPF changed the equation entirely. Instead of instrumenting at the library level—hooking into net/http or Express or Spring—eBPF instruments at the protocol level inside the Linux kernel itself.

When your Go application makes an HTTP request, that request eventually becomes syscalls: socket creation, data transmission, response reading. eBPF attaches probes to those kernel functions. It doesn't care whether your application is written in Go, Java, Python, or Rust. It doesn't care which HTTP library you're using. It sees the protocol.

One deployment instruments everything.

💡 Key Takeaway

OpenTelemetry eBPF Instrumentation captures metrics and traces at the kernel level, not the library level. One deployment instruments all applications regardless of programming language—Go, Java, Python, Node.js, Ruby, and .NET all work automatically without code changes or restarts.

How OBI Works

Understanding how OBI differs from traditional instrumentation explains why it can deliver under 2% overhead while instrumenting every protocol your applications use.

Protocol-Level vs Library-Level Instrumentation

Traditional auto-instrumentation hooks into specific libraries. The Java agent knows how to instrument Spring's RestTemplate, OkHttp, Apache HttpClient. The Node.js agent knows Express, Fastify, Axios. For each language and each library, there's specific instrumentation code.

This creates gaps. What happens when you use a library that isn't instrumented? What happens when a library updates and breaks the instrumentation? You get blind spots.

OBI takes a different approach. Instead of instrumenting libraries, it instruments protocols. HTTP is HTTP regardless of which library generates it. When data moves between user space and kernel space—socket operations, network I/O—OBI's eBPF probes capture it.

The OpenTelemetry blog explains it clearly: "Since OBI instruments at the protocol level, you can essentially instrument all applications (all programming languages, all libraries) with zero effort."

This protocol-level approach has a second advantage: consistency. Every service gets the same instrumentation quality. Your legacy Java monolith and your new Go microservices produce comparable telemetry.

Why under 2% Overhead Is Possible

The performance difference between eBPF and traditional agents comes down to where the code runs.

Traditional APM agents run inside your application process. They hook into library calls, which means they execute in user space every time your code makes an instrumented call. Each call involves: checking if tracing is enabled, creating span objects, propagating context, buffering data for export. That adds up.

eBPF runs in kernel space. When your application makes a syscall, the eBPF program executes as part of the kernel's handling of that syscall. There's no user-space context switch for instrumentation. The data capture is JIT-compiled and optimized by the kernel's eBPF verifier.

The benchmarks reflect this architectural difference. Pixie's measurements show eBPF probes typically consume less than 2% CPU overhead, with a maximum around 5%. Traditional APM agents, depending on the sampling rate and enabled features, range from 10% to 50%.

As New Relic's documentation notes: "eBPF programs run in the kernel space, minimizing the performance impact on applications."

💡 Key Takeaway

OBI adds less than 2% CPU overhead in typical deployments because eBPF executes in kernel space during normal syscall handling. Traditional APM agents add 10-50% overhead because they run in user space and intercept every library call. The architectural difference is 5-25x less overhead.

What OBI Captures Automatically

Without any configuration, OBI captures:

RED Metrics (per endpoint):

Request rate
Error rate (by status code)
Duration (latency histograms)

Distributed Traces:

Automatic trace context propagation across services
Service-to-service call graphs
Span creation for each protocol call

Supported Protocols (10 total):

HTTP/HTTPS and HTTP/2
gRPC
SQL (PostgreSQL, MySQL)
Redis
MongoDB
Kafka
GraphQL
Elasticsearch/OpenSearch
AWS S3

Protocol detection is automatic. OBI analyzes the traffic patterns and identifies the protocol without configuration. Your application doesn't need to declare what protocols it uses.

💡 Key Takeaway

OBI automatically captures RED metrics (request rate, error rate, duration) and distributed traces for 10 protocols including HTTP, gRPC, SQL, Redis, Kafka, and MongoDB. No configuration required—protocol detection happens automatically by analyzing traffic patterns.

Production Deployment Guide

Deploying OBI requires understanding the prerequisites, choosing the right deployment method, and knowing what configuration options matter.

Prerequisites Checklist

Kernel Requirements

OBI needs Linux kernel 4.4 at minimum. Kernel 5.x is recommended for full feature support. Check your version:

uname -r

Different kernel versions have different eBPF capabilities. Older kernels may not support all the probes OBI needs. If you're running managed Kubernetes (EKS, GKE, AKS), you typically have recent kernels, but verify for your specific node images.

Permission Requirements

eBPF needs kernel access to attach probes. That requires either CAP_BPF capability or root privileges. In Kubernetes, this means privileged security context.

Some organizations prohibit privileged containers by policy. If that's your situation, you'll need to work with your security team on exceptions for the OBI workloads, potentially on dedicated node pools.

Deployment Options

Three ways to deploy OBI:

DaemonSet (recommended for cluster-wide instrumentation): One OBI pod per node, instruments all pods on that node
Sidecar (per-pod): OpenTelemetry Operator injects OBI container into specific pods
Standalone (single host): Direct installation for non-Kubernetes environments

Kubernetes DaemonSet Deployment

For cluster-wide instrumentation, DaemonSet is the simplest approach. One deployment, full coverage.

Step 1: Deploy OBI DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: obi
  namespace: observability
spec:
  selector:
    matchLabels:
      app: obi
  template:
    metadata:
      labels:
        app: obi
    spec:
      hostPID: true
      hostNetwork: true
      containers:
      - name: obi
        image: ghcr.io/open-telemetry/opentelemetry-ebpf-instrumentation:latest
        securityContext:
          privileged: true
          runAsUser: 0
        env:
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector.observability:4317"
        - name: OTEL_SERVICE_NAME
          value: "obi"
        volumeMounts:
        - name: sys
          mountPath: /sys
          readOnly: true
      volumes:
      - name: sys
        hostPath:
          path: /sys

Note the key settings: hostPID: true allows OBI to see processes on the node, privileged: true enables kernel probe attachment, and the /sys mount provides access to kernel interfaces.

Step 2: Configure OpenTelemetry Collector

Your Collector needs to receive OBI's telemetry:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s

exporters:
  # Your backend: Jaeger, Grafana Tempo, Datadog, etc.
  otlp:
    endpoint: "tempo.observability:4317"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]

Step 3: Verify Instrumentation

Check OBI logs for discovered services:

kubectl logs -n observability -l app=obi --tail=100

Look for messages about discovered processes and attached probes. Then verify in your observability backend that traces are flowing and RED metrics are populating.

OpenTelemetry Operator Deployment (Sidecar)

For Go applications specifically, or when you want per-pod control, use the OpenTelemetry Operator.

Step 1: Install the Operator

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Step 2: Create Instrumentation Resource

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: obi-instrumentation
  namespace: your-namespace
spec:
  exporter:
    endpoint: http://otel-collector.observability:4317
  propagators:
    - tracecontext
    - baggage
  go:
    image: ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:latest

Step 3: Annotate Pods

metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-go: "true"

The Operator injects an eBPF sidecar that instruments the Go application. The sidecar requires privileged context and runs as root.

💡 Key Takeaway

Deploy OBI as a Kubernetes DaemonSet for cluster-wide instrumentation with a single deployment—one OBI pod per node instruments all workloads. For Go applications or per-pod control, use the OpenTelemetry Operator to inject sidecars. Both methods require privileged security context.

Essential Configuration

Environment Variables

The most important configuration options:

OTEL_EXPORTER_OTLP_ENDPOINT: Where to send telemetry (required)
OTEL_SERVICE_NAME: Override auto-detected service names
BEYLA_OPEN_PORT: Filter to specific ports (e.g., "8080,443")
BEYLA_EXECUTABLE_NAME: Filter to specific processes

Performance Tuning

For high-throughput environments:

BEYLA_BPF_BATCH_SIZE: Events to batch before sending (default 100)
BEYLA_TRACE_PRINTER: Enable for debugging (development only—adds overhead)

Decision Framework: OBI vs SDK Instrumentation

Not every service needs the same instrumentation approach. Here's when to use what:

Use OBI Alone When:

You need broad baseline observability quickly
You're instrumenting legacy or third-party applications you can't modify
You have a polyglot environment with 5+ languages
Your team lacks SDK expertise across all your languages
The under 2% overhead matters for performance-critical services

Add SDK Instrumentation When:

You need custom attributes (user ID, tenant ID, transaction type)
Business metrics are required (orders processed, revenue per transaction)
You need deep framework instrumentation (Spring bean timings, Django middleware details)
Complex sampling rules differ by service

The Hybrid Approach (Recommended)

Most production environments benefit from both:

Deploy OBI cluster-wide for baseline observability (zero effort, full coverage)
Add SDK instrumentation for top 10-20% of critical services (custom telemetry where it matters)
Result: 80% coverage instantly, 100% with targeted SDK work

This approach acknowledges that not every service needs custom attributes. Your database connection pool doesn't need user IDs in spans. But your checkout service probably does.

💡 Key Takeaway

Use OBI for baseline observability across all services (zero code changes required), then add SDK instrumentation only for critical services that need custom attributes or business metrics. This hybrid approach achieves 80% coverage instantly and completes the remaining 20% with targeted SDK work.

Comparison: OBI vs SDK vs Traditional APM

Capability	OBI	SDK Instrumentation	Traditional APM
Code changes required	None	Yes	Varies by agent
CPU overhead	under 2% typical	5-15%	10-50%
Custom attributes	No	Yes	Yes
Languages supported	All (kernel-level)	Per-language SDK	Per-language agent
Setup time	Minutes	Days to weeks	Hours to days
Trace context propagation	Automatic	Manual setup	Automatic
Business metrics	No	Yes	Yes
Deployment method	DaemonSet/sidecar	In-app dependency	Agent installation

Limitations and Gotchas

OBI isn't magic. Understanding its limitations prevents surprises in production.

No Application-Layer Context

This is the most significant limitation. OBI operates at the kernel level, which means it can't see inside your application's memory.

What's missing:

User IDs, session IDs, tenant identifiers
Business transaction types (order, refund, subscription)
Custom attributes and events you'd add with SDK
Application-specific error details beyond HTTP status codes

Why: eBPF probes attach to kernel functions. They see syscalls and network data. They don't have access to application-level variables or business logic.

Workaround: Add SDK instrumentation for services where this context matters. OBI gives you the baseline; SDK adds the business semantics.

Kernel Compatibility Issues

Research from ACM analyzing 25 kernel images across 8 years found that 83% of real-world eBPF programs are impacted by dependency mismatches. Different kernel versions have different eBPF features, and different Linux distributions configure kernels differently.

Specific issues:

Kernel below 4.4: OBI won't run
Kernel 4.x: Limited eBPF features, some probes may fail
Kernel 5.x: Full support recommended

Mitigation: Test OBI on your exact production kernel version. If you use managed Kubernetes, verify the node image kernel version. Don't assume that staging (kernel 5.15) and production (kernel 4.19) will behave identically.

Privileged Access Required

eBPF attaches probes to kernel functions. That requires elevated privileges—either CAP_BPF capability or root. In Kubernetes, this translates to privileged: true in the security context.

Security implications:

Privileged containers can access host resources
Some organizations prohibit them by policy
Compliance frameworks may require justification

Mitigation strategies:

Dedicate node pools for observability workloads
Document the business case for your security team
Use Pod Security Policies or OPA Gatekeeper to allow specific workloads
Consider the alternative: running without observability is also a security risk

Linux Only

eBPF is a Linux kernel technology. It doesn't exist on Windows or macOS.

No support for:

Windows containers
macOS (even in development)
FreeBSD

If you have Windows workloads, you'll need traditional instrumentation for those. OBI handles the Linux side.

Debugging Complexity

When something goes wrong with OBI, debugging is harder than with traditional agents.

Challenges:

eBPF verifier errors are cryptic
Limited debugging tools for kernel-space code
Probe attachment failures may be silent

Mitigation: Use OBI as a packaged tool (don't write custom eBPF), check logs for probe attachment messages, and test thoroughly in staging before production rollout.

💡 Key Takeaway

OBI has real limitations to understand before production deployment: no custom attributes (kernel-level instrumentation can't see application memory), kernel compatibility issues affecting 83% of eBPF programs, privileged container access required, and Linux-only support. Test on your exact production kernel version and plan SDK instrumentation for services needing business context.

Common Mistakes to Avoid

Expecting custom attributes: OBI captures protocol data, not application data. If you need user IDs in traces, use SDK.
Skipping kernel version verification: OBI will fail on old kernels, sometimes silently. Verify before deploying.
Forgetting privileged security context: Pods won't start without proper permissions. Plan for security team conversations.
Not testing at scale: Overhead increases with request volume. Test with production-like traffic.
Replacing all SDK instrumentation: OBI provides baseline observability. Keep SDK for services needing custom telemetry.

Getting Started: 4-Week Rollout Plan

Week 1: Preparation

Verify kernel version on all nodes (uname -r should be ≥4.4, prefer 5.x)
Check security policies for privileged container allowances
Identify your OpenTelemetry Collector endpoint
Choose deployment method (DaemonSet for most cases)
Select initial services to instrument (start with non-critical)
Document current observability coverage for comparison

Week 2: Staging Deployment

Deploy OBI to staging cluster
Verify traces appearing in observability backend
Check RED metrics per service
Measure CPU overhead on OBI pods
Test service-to-service trace propagation
Identify any kernel compatibility issues

Week 3: Production Rollout

Deploy to production (single namespace first)
Monitor for performance impact on application pods
Validate trace completeness compared to staging
Expand to additional namespaces incrementally
Document which services need SDK additions

Week 4: Optimization

Identify services requiring custom attributes
Add SDK instrumentation for top 10% critical services
Configure sampling rules in Collector for cost management
Set up alerts on OBI pod health and resource usage
Update runbooks for new observability stack
Train team on querying OBI-generated telemetry

Red Flags During Rollout

Watch for these signals that something needs attention:

OBI pods in CrashLoopBackOff: Usually kernel compatibility. Check logs for specific errors.
Missing traces for specific languages: Verify the language is supported and probes are attaching.
High CPU on OBI pods: Reduce batch size, check request volume, verify configuration.
Incomplete trace context: Ensure Collector is receiving and properly processing spans.

💡 Key Takeaway

Roll out OBI incrementally: staging first, then one production namespace, then expand. Verify kernel compatibility, privileged access, and trace completeness at each stage before proceeding. Plan for SDK additions—OBI provides baseline observability, not complete custom telemetry.

Practical Actions This Week

For Individual Engineers

Monday: Check your Kubernetes node kernel versions. Run kubectl get nodes -o wide and note the kernel version column. If any nodes are below 4.4, flag it.

Tuesday: List the languages in your service mesh. OBI instruments all of them, but you need to know what SDK work might be needed later.

Wednesday: Identify one non-critical service for initial OBI testing. Something with HTTP traffic, not in the critical path.

For Platform Teams

This Week:

Review security policies for privileged containers
Verify OpenTelemetry Collector is deployed and healthy
Choose between DaemonSet and Operator deployment
Create staging deployment plan

Next Month:

Complete staging rollout and validation
Document performance baselines
Begin production rollout by namespace
Establish hybrid strategy for SDK additions

For Leadership

Business Case: OBI reduces observability instrumentation time from months to days. Instead of coordinating SDK integration across 10+ teams for 50 services, deploy once and instrument everything.

Ask: Approve privileged container security exception for observability namespace. The risk of running without observability (blind to production issues) exceeds the risk of privileged containers in a controlled namespace.

Timeline:

Week 1-2: Staging validation
Week 3-4: Production rollout
Ongoing: SDK additions for critical services

Success Metric: Time to full observability coverage. Before OBI: 3-6 months for partial coverage. With OBI: 4 weeks for baseline coverage, SDK additions as needed.

📚 Learning Resources

Official Documentation

OpenTelemetry OBI Documentation - Setup guides, configuration reference, Kubernetes deployment manifests
OpenTelemetry Zero-Code Overview - All zero-code instrumentation approaches by language
OBI GitHub Repository - Source code, issues, release notes

Grafana Resources

Grafana Beyla Documentation - Comprehensive guide for Grafana's OBI distribution
Grafana OpenTelemetry Report - Adoption patterns, challenges, and industry statistics

Key Announcements

OBI First Release Announcement (November 2025) - Capabilities, supported protocols, getting started
Grafana Beyla Donation (May 2025) - Background on community transition
Go Auto-Instrumentation Beta (March 2025) - Go-specific eBPF features

Technical Deep Dives

Trail of Bits: eBPF Pitfalls - Security considerations and limitations
Last9: Zero-Code Instrumentation - Practical implementation guide

Community

CNCF Slack: #otel-ebpf-instrumentation channel for async discussions
Weekly SIG Calls: Thursdays at 13:00 UTC for real-time collaboration

Platform Engineering Playbook Resources:

eBPF in Kubernetes: Kernel-Level Superpowers Without the Risk - Deep dive on how eBPF works and production safety
Observability Tools Showdown - Comparing observability platforms and approaches
OpenTelemetry Technical Page - Learning resources and ecosystem overview

Conclusion

OpenTelemetry eBPF Instrumentation changes the economics of observability adoption. Instead of months of SDK integration work across multiple teams and languages, you deploy once and instrument everything. The under 2% overhead makes it practical for performance-sensitive workloads. The protocol-level approach makes it language-agnostic.

But OBI isn't a complete replacement for SDK instrumentation. It provides baseline observability—RED metrics and distributed traces for the protocols your applications use. For custom attributes, business metrics, and deep framework instrumentation, you still need targeted SDK work.

The practical path forward: deploy OBI for cluster-wide baseline coverage, then add SDK instrumentation for the 10-20% of services that need custom telemetry. You get 80% coverage immediately and 100% with targeted effort.

The first alpha release shipped in November 2025. The ecosystem is moving fast, with Grafana Labs, Splunk, Coralogix, and Odigos all contributing. Now is the time to evaluate OBI for your environment—before your observability debt compounds further.

Have questions about deploying OBI? Open an issue on our GitHub repository or join the discussion.

Quick Answer (TL;DR)​

Key Statistics (2025 Data)​

The Observability Instrumentation Gap​

Why Traditional Auto-Instrumentation Falls Short​

The Kernel-Level Breakthrough​

How OBI Works​

Protocol-Level vs Library-Level Instrumentation​

Why under 2% Overhead Is Possible​

What OBI Captures Automatically​

Production Deployment Guide​

Prerequisites Checklist​

Kubernetes DaemonSet Deployment​

OpenTelemetry Operator Deployment (Sidecar)​

Essential Configuration​

Decision Framework: OBI vs SDK Instrumentation​

Comparison: OBI vs SDK vs Traditional APM​

Limitations and Gotchas​

No Application-Layer Context​

Kernel Compatibility Issues​

Privileged Access Required​

Linux Only​

Debugging Complexity​

Common Mistakes to Avoid​

Getting Started: 4-Week Rollout Plan​

Week 1: Preparation​

Week 2: Staging Deployment​

Week 3: Production Rollout​

Week 4: Optimization​

Red Flags During Rollout​

Practical Actions This Week​

For Individual Engineers​

For Platform Teams​

For Leadership​

📚 Learning Resources​

Official Documentation​

Grafana Resources​

Key Announcements​

Technical Deep Dives​

Community​

Related Content​

Conclusion​