The Problem
Most RAG implementations are prototypes. They embed a few PDFs, wire up a similarity search, and call the model with the retrieved chunks. This works in a demo. In production, it breaks in three common ways:
- Ingestion is not idempotent. Re-running the pipeline creates duplicate chunks.
- The model answers confidently when it should refuse. Hallucinated answers with no citations look indistinguishable from grounded ones.
- There is no way to debug retrieval. When an answer is wrong, you cannot tell if the fault is in the embedding, the chunking, or the prompt.
OpenClaw DocOps Agent is built specifically to address all three.
What Was Built
The agent has three primary systems: an ingestion pipeline, an answering layer, and a lifecycle ops interface.
Ingestion Pipeline
PDFs are processed through a deterministic pipeline:
- Text extraction with page and section boundary detection
- Chunking with configurable overlap and boundary rules
- Embedding via OpenAI
text-embedding-3-small - Storage in Qdrant Cloud with deterministic chunk IDs
The chunk ID is derived from the document ID and chunk position—not randomly generated. This means re-ingesting the same document is a no-op: existing chunks are updated in place, not duplicated. Partial ingestion failures can be resumed without creating duplicates.
Grounded Answering
The answering layer retrieves the top-K most relevant chunks for a query, assembles them into a context window, and prompts the model to answer only from the provided context.
The model is instructed to:
- Return inline citations referencing the source document and chunk
- Refuse to answer if the context is insufficient—with a clear signal that the refusal is intentional, not a failure
This refusal behaviour is the most important feature. It means the system can be trusted for high-stakes queries (compliance, legal, technical documentation) where a confident wrong answer is worse than an honest "I don't have enough information."
Audit Harness
The audit runner takes a set of test queries and expected answers, runs them through the agent, and produces a JSON + Markdown report with pass/fail status and retrieved context for each query.
This makes it possible to evaluate the system before deploying changes—new chunking strategy, new embedding model, new prompt. Run the harness. Compare the report. Ship with confidence.
Doc Lifecycle Ops
Production document systems need more than ingestion and querying:
list/get— inspect what documents are registered and their chunk countsexport/import— move document registries between environmentsrebuild— re-embed a document from existing chunks (model upgrade path)delete— removes both chunks from Qdrant and the registry record atomically
Each of these has a CLI and an API endpoint. The delete operation is atomic: it cannot leave chunks in Qdrant with no registry record, or a registry record with no chunks.
Key Engineering Decisions
Deterministic chunk IDs. This is the single decision that makes the ingestion pipeline reliable for production use. Without it, every re-run adds data.
Refusal over hallucination. The grounding prompt is strict by design. The model is penalised in evaluation for producing answers not supported by the retrieved context, not rewarded for filling gaps.
Separation of retrieval and answering. The retrieval debug CLI lets you inspect what chunks would be retrieved for any query—independently of calling the model. This makes it possible to diagnose retrieval failures without burning tokens.
Scope
- ✓Ingestion pipeline that extracts, chunks, embeds and stores PDFs in Qdrant Cloud with deterministic chunk IDs and safe retry behavior.
- ✓Grounded answering layer that returns citations and refuses when context is insufficient instead of hallucinating.
- ✓Audit runner that produces JSON + Markdown reports for repeatable evaluation and iteration.
- ✓Doc lifecycle ops: registry list/get, export/import, rebuild from chunks, and delete that cleans up both chunks and registry state.
- ✓Ops utilities: retrieval debug, diagnostics, redacted config snapshot and cache cleanup CLIs/APIs.
Waqas Raza
AI-Native Full-Stack Engineer. Top Rated on Upwork · $180K+ earned · 93% job success. I build production AI agents, LLM systems, Web3 platforms, and full-stack applications.
Hire me on Upwork