Learning Objectives
- Understand what LLM observability is and why it matters for production AI applications
- Identify Datadog's key LLM monitoring features including tracing, cost tracking, and agentic AI monitoring
- Compare Datadog LLM Observability to purpose-built alternatives like LangSmith and Helicone
What Is LLM Observability?
When you deploy an AI application in production, you need to know: Is it working? How much is it costing? Are responses accurate? How fast is it? LLM Observability answers these questions by monitoring every interaction between your application and AI models.
Datadog LLM Observability extends Datadog's industry-leading monitoring platform to cover AI workloads. It automatically traces every LLM call — capturing latency, token usage, estimated cost, error rates, and response quality — and correlates this data with your existing infrastructure metrics, application traces, and logs.
💡Key Concept
Observability vs. Monitoring: Monitoring tells you when something is wrong (an alert fires). Observability tells you why — by providing the detailed traces, metrics, and logs needed to diagnose problems. For AI applications, observability means seeing exactly which LLM call in a multi-step agent workflow caused a failure, how much each call cost, and how the AI's behavior changed after a prompt update.
Core Features
LLM Call Tracing
Automatic tracing and annotation of every LLM call — no code changes required. Each trace captures:
- Latency — how long the model took to respond
- Token usage — input and output tokens consumed
- Estimated cost — calculated from provider pricing and token counts
- Error rates — failed calls, timeouts, rate limits
- Full request/response content — for debugging and evaluation
Execution Flow Charts
Visual diagrams showing agent decision paths, tool usage, and retrieval steps. See exactly how a multi-step AI agent navigated a complex task — which tools it called, what data it retrieved, and where it decided to branch.
AI Agents Console (June 2025)
A dedicated dashboard for monitoring AI agents in production:
- Track actions, security posture, and performance of any AI agent
- Monitor user engagement and business value metrics
- Works with both custom-built and third-party agents
- Visibility into agentic workflows spanning multiple models and tools
LLM Experiments (June 2025)
A structured experimentation framework for testing changes before shipping to production:
- Compare prompt changes, model swaps, and configuration updates
- Measure impact on quality, latency, and cost
- Prove results before rolling out to users
Bits AI Copilot
Datadog's built-in AI assistant that queries across all your observability data using natural language:
- Identifies root causes "90% faster" than manual investigation
- Integrates into Slack incident response channels with automatic summaries
- Can automate alert investigations, code fixes, and security triage
Supported LLM Providers
| Language | Supported Providers |
|---|---|
| Python SDK | OpenAI; Anthropic; AWS Bedrock; LangChain; Google Vertex AI |
| Node.js SDK | OpenAI; Anthropic; Azure OpenAI; AWS Bedrock; Google Vertex AI; LangChain; Vercel AI SDK |
| OpenTelemetry | Vendor-neutral via GenAI Semantic Conventions (any provider) |
Additional integrations include GitHub Copilot usage tracking, Microsoft Copilot monitoring, LiteLLM gateway tracing, and cloud cost management for Anthropic and GitHub spend.
Pricing
Datadog LLM Observability is billed per LLM span (each call to an LLM provider counts as one span; a single user request may generate multiple spans).
⚠️Warning
LLM Observability is an add-on to Datadog's platform — there is no standalone free tier. Pricing is not fully transparent on the public pricing page; enterprise customers typically negotiate custom rates. Third-party estimates suggest approximately $8 per 10,000 requests, but verify current rates at datadoghq.com/pricing.
For teams that only need LLM monitoring without full-stack observability, purpose-built tools like Helicone (open-source, generous free tier) or LangSmith ($39 per user per month) offer much lower entry points.
Datadog LLM Observability vs. Competitors
| Platform | Best For | Key Advantage |
|---|---|---|
| Datadog LLM Observability | Enterprise teams already on Datadog | Full-stack correlation: LLM + infrastructure + APM + logs in one platform |
| LangSmith | LangChain/LangGraph users | Zero-config for LangChain; excellent debugging; $39/user/month |
| Helicone | Startups and lightweight LLM logging | Open-source; 1-line proxy integration; generous free tier |
| Arize AI | ML teams needing evaluation and drift detection | Strong evaluation metrics; MLOps heritage |
| New Relic | Enterprise teams already on New Relic | Consumption-based pricing; full-stack monitoring |
Datadog's unique advantage: It is the only platform that correlates LLM performance with the entire application stack — APM traces, infrastructure metrics, logs, cloud costs, and security signals — in a single pane of glass.
Company Details
| Detail | Info |
|---|---|
| Company | Datadog Inc. (NASDAQ: DDOG) |
| Founded | 2010 |
| CEO | Olivier Pomel (co-founder) |
| Headquarters | New York, New York |
| Employees | ~9,700 |
| Revenue (FY2025) | $3.43 billion (+28% year-over-year) |
| 2026 Revenue Guidance | $4.06-$4.10 billion |
| Free Cash Flow (FY2025) | $915 million |
| Market Cap | ~$44-46 billion |
| Total Customers | ~32,700 |
| Fortune 500 Penetration | 48% |
| Million-Dollar Customers | 603 (+31% year-over-year) |
| Website | datadoghq.com |
Strengths
- Full-stack correlation — the only LLM monitoring tool that integrates with infrastructure, APM, logs, security, and cloud costs in one platform
- No-code instrumentation — automatic tracing of LLM calls without code changes for major providers
- Agentic AI monitoring — dedicated AI Agents Console and experiment framework for testing changes safely
- Bits AI copilot — natural language querying across all observability data for faster incident response
- Enterprise scale — 32,700 customers, 48% of Fortune 500, $3.43 billion revenue
Limitations and Considerations
- Cost — Datadog is expensive; LLM Observability is an add-on to an already premium platform with no standalone free tier
- Platform lock-in — most valuable when you are already a Datadog customer using APM, logs, and infrastructure monitoring
- Pricing opacity — per-span billing is not clearly published; costs can escalate quickly with high-volume AI applications
- Overkill for simple use cases — if you only need to track LLM costs and latency, Helicone or LangSmith are simpler and cheaper
- LLM-specific features are newer — purpose-built tools like LangSmith have deeper LLM debugging and evaluation capabilities
Key Takeaways
- Datadog LLM Observability monitors AI application performance by tracking every LLM call — latency, tokens, cost, errors — and correlating with full-stack infrastructure metrics
- The AI Agents Console and LLM Experiments features (launched June 2025) enable monitoring agentic AI workflows and testing changes before production
- Most valuable for enterprise teams already using Datadog who want to add AI monitoring without adopting another vendor
- For LLM-only monitoring without full-stack needs, purpose-built tools like Helicone (free, open-source) or LangSmith ($39 per user) are more cost-effective alternatives