OpenTelemetry: Unified Observability for Modern Applications
Observability is the ability to understand what your application is doing by examining its outputs. OpenTelemetry (OTel) is the CNCF-standard framework for instrumenting applications and collecting telemetry data — traces, metrics, and logs — in a vendor-neutral way. One instrumentation, multiple backends.
The Three Pillars
Traces: End-to-end record of a request through your system. A trace spans multiple services and shows timing, dependencies, and where errors occur.
Metrics: Numerical measurements over time: request rate, error rate, latency percentiles, CPU usage.
Logs: Discrete event records with context. OTel correlates logs with traces using trace IDs.
OpenTelemetry defines the standard for collecting this data. You choose where it goes (Jaeger, Zipkin, Prometheus, Grafana, Datadog, Honeycomb, etc.).
Node.js Auto-Instrumentation
OpenTelemetry can automatically instrument popular Node.js frameworks (Express, Fastify, NestJS, Koa) with no code changes:
npm install \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http
Create instrumentation.ts (loaded before application code):
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
const sdk = new NodeSDK({
serviceName: "my-api",
traceExporter: new OTLPTraceExporter({
url: "http://otel-collector:4318/v1/traces",
}),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: "http://otel-collector:4318/v1/metrics",
}),
exportIntervalMillis: 30000,
}),
instrumentations: [
getNodeAutoInstrumentations({
"@opentelemetry/instrumentation-fs": { enabled: false }, // too noisy
}),
],
});
sdk.start();
Load it before your app:
// package.json
{
"scripts": {
"start": "node --require ./dist/instrumentation.js dist/index.js"
}
}
With this configuration, all HTTP requests, database queries (pg, mysql2, mongoose), Redis calls, and external fetch calls are automatically traced.
Manual Instrumentation
For custom spans around business logic:
import { trace, context, SpanStatusCode } from "@opentelemetry/api";
const tracer = trace.getTracer("my-service");
async function processOrder(orderId: string) {
// Create a span for this operation
return tracer.startActiveSpan("processOrder", async (span) => {
try {
span.setAttribute("order.id", orderId);
span.setAttribute("order.source", "web");
const order = await fetchOrder(orderId);
span.setAttribute("order.total", order.total);
await validateInventory(order);
await chargePayment(order);
span.setStatus({ code: SpanStatusCode.OK });
return order;
} catch (error) {
span.recordException(error as Error);
span.setStatus({
code: SpanStatusCode.ERROR,
message: (error as Error).message,
});
throw error;
} finally {
span.end();
}
});
}
Custom Metrics
import { metrics } from "@opentelemetry/api";
const meter = metrics.getMeter("my-service");
// Counter: things that only go up
const requestCounter = meter.createCounter("http.requests", {
description: "Total HTTP requests",
});
// Histogram: distribution of values (latency, size)
const requestDuration = meter.createHistogram("http.request.duration", {
description: "HTTP request duration in milliseconds",
unit: "ms",
});
// Observable gauge: value sampled at collection time
const activeConnections = meter.createObservableGauge("db.connections.active", {
description: "Active database connections",
});
activeConnections.addCallback((result) => {
result.observe(pool.activeConnections);
});
// Usage in route handler
app.use((req, res, next) => {
const startTime = Date.now();
res.on("finish", () => {
const duration = Date.now() - startTime;
requestCounter.add(1, {
method: req.method,
route: req.route?.path ?? "unknown",
status: res.statusCode.toString(),
});
requestDuration.record(duration, {
method: req.method,
route: req.route?.path ?? "unknown",
});
});
next();
});
Baggage: Cross-Service Context
Baggage carries key-value pairs through distributed traces:
import { propagation, context } from "@opentelemetry/api";
// Set baggage (e.g., in gateway)
const baggage = propagation.createBaggage({
"user.id": { value: userId },
"tenant.id": { value: tenantId },
});
const ctx = propagation.setBaggage(context.active(), baggage);
// Read baggage in downstream service
const incomingBaggage = propagation.getBaggage(context.active());
const userId = incomingBaggage?.getEntry("user.id")?.value;
The OpenTelemetry Collector
The OTel Collector is a standalone service that:
- Receives telemetry from your services
- Processes/filters/transforms it
- Exports to multiple backends simultaneously
Docker Compose setup:
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./otel-config.yaml:/etc/otelcol-contrib/config.yaml
ports:
- "4317:4317" # gRPC
- "4318:4318" # HTTP
- "8888:8888" # Collector metrics (Prometheus)
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Jaeger UI
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
Collector config (otel-config.yaml):
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
memory_limiter:
limit_mib: 512
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
logging:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger, logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
Correlating Logs with Traces
Inject trace context into your logger for correlated logs:
import { trace } from "@opentelemetry/api";
import winston from "winston";
const logger = winston.createLogger({
format: winston.format.combine(
winston.format.timestamp(),
winston.format.json(),
winston.format((info) => {
const span = trace.getActiveSpan();
if (span) {
const spanContext = span.spanContext();
info.traceId = spanContext.traceId;
info.spanId = spanContext.spanId;
}
return info;
})()
),
transports: [new winston.transports.Console()],
});
Now every log line in an active trace includes traceId and spanId, enabling correlation in backends like Grafana Loki.
Sampling
For high-throughput services, recording every trace is expensive. Configure sampling:
import { TraceIdRatioBased, ParentBasedSampler } from "@opentelemetry/sdk-trace-base";
const sdk = new NodeSDK({
sampler: new ParentBasedSampler({
root: new TraceIdRatioBased(0.1), // Sample 10% of traces
}),
// ...
});
Or use head-based sampling in the collector to reduce data volume without changing application code.
Backends
| Backend | Traces | Metrics | Logs | Hosting |
|---|---|---|---|---|
| Jaeger | ✓ | – | – | Self-hosted |
| Zipkin | ✓ | – | – | Self-hosted |
| Prometheus + Grafana | – | ✓ | via Loki | Self-hosted |
| Grafana Tempo | ✓ | – | – | Self-hosted or Cloud |
| Grafana Stack (Tempo + Mimir + Loki) | ✓ | ✓ | ✓ | Self-hosted or Cloud |
| Honeycomb | ✓ | ✓ | ✓ | SaaS |
| Datadog | ✓ | ✓ | ✓ | SaaS |
For self-hosted: Grafana + Tempo + Prometheus + Loki covers all three pillars with a unified dashboard.
Why OTel vs. Vendor-Specific SDKs
If you use Datadog's SDK, you get Datadog telemetry. Switching to another backend means re-instrumenting. OTel instruments once and routes data to any backend via the collector. This avoids vendor lock-in at the instrumentation layer — you're locked to the SDK semantics (traces, metrics, logs), not a vendor.
For teams uncertain about their observability backend, OTel is a safe default.