Case study

Deep Research Agent

Python + LangChain + LangGraph

PythonLangChainLangGraphOpenAIFastAPI

Key outcomes

Deterministic, reproducible research runs
URL + document ingestion (HTML, PDF, DOCX, TXT, MD, CSV)
Traceable artifacts for every run

The Problem

Most AI research tools are demos. They look impressive in a video—type a question, watch the agent browse the web, get a summary. In production they break in predictable ways: they hallucinate citations, loop endlessly when a URL times out, produce different output for the same input, and leave no trace of what they actually fetched.

A production research agent needs to be deterministic, observable, and recoverable. This one is.

What Was Built

The Deep Research Agent is a LangGraph-based multi-step agent that takes a research question (and optional seed URLs or documents) and produces a structured, cited report.

The agent runs in four phases:

  1. Planning — generates a research plan (plan.md) with scoped sub-questions and target source types
  2. Fetching — retrieves URLs and documents, normalises them to text
  3. Note-taking — extracts relevant information per sub-question into notes.md with source attribution
  4. Synthesis — writes the final report.md with inline citations referencing sources.json

Every phase writes to the run's artifact directory (runs/<thread_id>/). If a run fails mid-way, you can inspect exactly where it stopped.

The Fetch Pipeline

Fetching is where most research agent implementations break. This pipeline handles:

  • HTML — clean text extraction, JS-rendered pages via headless fetch where needed
  • PDF — text extraction with page boundary awareness
  • DOCX, TXT, MD, CSV — each with appropriate parsers

Every fetch has a configurable timeout and size cap. A single large document cannot stall the agent—it is truncated to the token budget and flagged in the source manifest.

Failed fetches are recorded in sources.json with a failure reason, not silently dropped.

Guardrails

The agent operates within strict limits defined at invocation:

  • max_sources — caps total sources fetched per run
  • max_links_per_source — prevents recursive link-following explosions
  • max_tokens_per_note — keeps context within model limits
  • HTTP and model timeouts — no hanging requests
  • Retry limits with exponential backoff

These guardrails are not afterthoughts. They are the primary mechanism that makes the agent safe to run in production without human supervision.

Artifact Completion

If a run terminates abnormally (OOM, timeout, upstream error) before report.md is written, an artifact completion step runs. This is a lightweight, tool-free model call that reads the available notes.md and writes a best-effort summary report. The run is never in a state where artifacts are partially written with no report.

Key Engineering Decisions

LangGraph over a custom loop. LangGraph's explicit state machine made it straightforward to define phase transitions, handle conditional edges (retry vs fail vs complete), and inspect intermediate state during debugging.

Thread ID as the artifact namespace. Every run gets a UUID thread ID. All artifacts live under runs/<thread_id>/. This makes runs independently inspectable, comparable, and replayable from any step.

Determinism as a design goal. Given the same inputs and the same model version, the agent produces the same plan. Source selection is ordered, not random. This makes output variance debuggable rather than mysterious.

Scope

  • Tool-using workflows via LangChain / LangGraph with a Deep Agents-inspired design.
  • Fetch pipeline that handles HTML and the top 5 document formats with safe timeouts and size caps.
  • Runs produce plan.md, notes.md, sources.json, report.md and normalized source files under runs/<thread_id>/.
  • Guardrails: caps on max_sources, max_links_per_source, HTTP/model timeouts, token limits and retries.
  • Automatic artifact completion if report.md is missing after a run, using a tool-free model call.
Like what you see?

Waqas Raza

AI-Native Full-Stack Engineer. Top Rated on Upwork · $180K+ earned · 93% job success. I build production AI agents, LLM systems, Web3 platforms, and full-stack applications.

Hire me on Upwork