Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated March 8, 2026

Firecrawl

Firecrawl logoBy Firecrawl

Firecrawl is an AI-optimized web scraping API that converts any website into clean, LLM-ready Markdown — handling JavaScript rendering, pagination, authentication, and anti-bot measures so developers can feed web content directly into AI pipelines without building a custom scraper.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what Firecrawl does and why AI-optimized web scraping is distinct from standard scraping
  • Identify Firecrawl's core features: crawl, scrape, extract, and search modes
  • Evaluate when to use Firecrawl vs. Apify, Browse AI, or building a custom scraper

What Is Firecrawl?

Firecrawl is a web scraping and data extraction API founded in 2024 by Mendable AI, optimized specifically for feeding web content into AI applications. Unlike traditional web scrapers that return raw HTML, Firecrawl processes pages and returns clean Markdown — formatted content that retains structure (headings, lists, tables) while removing navigation, ads, scripts, and other noise.

The core value proposition is removing the preprocessing step that typically exists between web scraping and LLM feeding: Firecrawl outputs content that can be passed directly to a language model or stored in a vector database without any additional cleaning.

Tip

Try Firecrawl: firecrawl.dev — free tier with 500 credits; Hobby plan $16/month; Scale plan $83/month; open-source version available for self-hosting

Core Features

Single Page Scrape

The most basic operation: scrape a single URL and return clean Markdown:

import firecrawl

app = firecrawl.FirecrawlApp(api_key="your-api-key")
result = app.scrape_url("https://example.com/docs/getting-started")

print(result["markdown"])  # Clean Markdown ready for LLM

Handles:

  • JavaScript-rendered pages (React, Vue, Angular SPAs)
  • Cookie consent dialogs and popups
  • Lazy-loaded content
  • Dynamic content that requires interaction

Full Site Crawl

Crawl an entire website and return all pages as clean Markdown:

crawl_result = app.crawl_url(
    "https://docs.example.com",
    params={
        "crawlerOptions": {
            "excludes": ["/blog/*", "/changelog/*"],
            "maxDepth": 3
        }
    }
)

Useful for:

  • Building a knowledge base from a documentation site
  • Indexing company websites for RAG
  • Competitive intelligence across multiple pages

Structured Data Extraction (LLM-Powered)

Firecrawl can use an LLM to extract structured data from pages — define a schema and Firecrawl returns typed JSON:

from pydantic import BaseModel

class ProductInfo(BaseModel):
    name: str
    price: float
    description: str
    in_stock: bool

result = app.scrape_url(
    "https://shop.example.com/product/123",
    params={"extractorOptions": {"extractionSchema": ProductInfo.model_json_schema()}}
)
print(result["extracted"])  # {"name": "...", "price": 49.99, ...}

💡Key Concept

Why JavaScript rendering matters: Many modern websites are single-page applications (SPAs) built with React, Vue, or Angular. When you fetch these pages with a simple HTTP request (like Python's requests library), you get a nearly empty HTML shell — the actual content is loaded by JavaScript after the page loads. Firecrawl runs a real browser (headless Chromium) to execute the JavaScript and capture the fully rendered content, then converts it to clean Markdown.

Map — Discover All URLs

Given a domain, Firecrawl returns a sitemap of all discoverable URLs without downloading content:

urls = app.map_url("https://docs.example.com")
# Returns: ["https://docs.example.com/", "https://docs.example.com/api/", ...]

Useful for planning a crawl before executing it.

Search and Scrape

Combine web search with content extraction:

result = app.search("AI agent frameworks 2026",
                    params={"pageOptions": {"fetchPageContent": True}})

Returns search results with full page content — combining Tavily-style search with Firecrawl's content extraction.

LangChain and Framework Integration

Firecrawl has native LangChain integration:

from langchain_community.document_loaders.firecrawl import FireCrawlLoader

loader = FireCrawlLoader(url="https://docs.example.com", mode="crawl", api_key="...")
docs = loader.load()  # List of LangChain Documents ready for vector store

Pricing

Free$0
  • 500 credits
  • 1
  • Prototyping
  • Evaluation
Hobby$16/month
  • 3,000 credits
  • 5
  • Side projects
  • Small applications
Standard$83/month
  • 100,000 credits
  • 20
  • Production applications
Growth$333/month
  • 500,000 credits
  • 50
  • Large-scale crawling
Self-hostedFree (open-source)
  • Unlimited
  • Depends on server
  • Privacy
  • Custom infrastructure

Credits are consumed per page scraped — one credit per page. A typical documentation site crawl might use 200–2,000 credits.

Strengths

  • LLM-ready output: Markdown output with structure preserved — no postprocessing needed before feeding to models
  • JS rendering: Handles SPAs and dynamic content that simple HTTP scrapers cannot
  • Structured extraction: LLM-powered JSON extraction from pages on a defined schema
  • Framework integration: Native LangChain loader reduces integration to a few lines
  • Open source: Self-hostable for privacy-sensitive use cases
  • Simple API: Clean Python/TypeScript SDK with clear pricing

Limitations & Considerations

  • Crawl rate limits: Anti-bot measures on some sites may still block Firecrawl — no scraper bypasses all protections
  • Credit consumption: Large sites can consume credits quickly — plan crawl scope carefully
  • Dynamic auth: Pages requiring active user sessions (login-gated content) require additional auth handling
  • Not real-time streaming: Crawls complete asynchronously — polling or webhooks needed for large jobs
  • Newer product: Less mature than Apify for complex enterprise scraping workflows

Best Use Cases

TaskWhy Firecrawl
Documentation site indexing for RAGCrawl entire docs site; convert to Markdown; insert into vector store
Competitor website monitoringScrape product pages, pricing, and announcements regularly
News and article ingestion for AIConvert article URLs to clean content for summarization pipelines
Building knowledge bases from web contentCrawl company websites or wikis for internal AI assistants
Research data collectionExtract structured data from product listings, profiles, or databases
LangChain agent toolsFireCrawlLoader for web content in document Q&A agents

When to choose alternatives:

  • Complex multi-step browser automation → Apify (more mature workflow tooling)
  • No-code monitoring without coding → Browse AI
  • Web search API (not scraping) → Tavily or SerpAPI
  • Enterprise scraping with proxies and anti-bot → Apify or Bright Data
  • Structured data from specific sites → Diffbot (trained models per site type)

Getting Started

  1. Get an API key at firecrawl.dev — free 500 credits, no credit card
  2. Install: pip install firecrawl-py
  3. Scrape your first page:
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="your-key")
result = app.scrape_url("https://en.wikipedia.org/wiki/Large_language_model")
print(result["markdown"][:2000])
  1. Try the LangChain loader for a RAG pipeline: from langchain_community.document_loaders.firecrawl import FireCrawlLoader

Tip

RAG pipeline shortcut: For building a RAG knowledge base from any documentation site, Firecrawl's LangChain integration is the fastest path: (1) crawl the site with FireCrawlLoader, (2) split with RecursiveCharacterTextSplitter, (3) embed with OpenAI or any embedding model, (4) store in Chroma or Pinecone. This four-step pipeline can be operational in under an hour with the free tier.

Key Takeaways

  • Firecrawl converts any website into clean LLM-ready Markdown — handling JavaScript rendering, dynamic content, and pagination automatically
  • Structured data extraction uses an LLM to pull typed JSON fields from pages on a developer-defined schema
  • Native LangChain integration reduces web content ingestion for RAG pipelines to a few lines of code
  • The free tier (500 credits) is sufficient for prototyping; production use starts at $16/month
  • Best choice for developers building AI applications that need clean web content as input — documentation RAG, news ingestion, competitive intelligence, and content monitoring

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you