← All news
Press · June 29, 2026 · 11 min read

AI Document Quality Scorecard: Rate Your Corpus RAG-Readiness in 30 Minutes — and Know Where to Focus

AI Document Quality Scorecard: Rate Your Corpus RAG-Readiness in 30 Minutes — and Know Where to Focus

72% of enterprise RAG deployments fail in the first year. Is your corpus the reason? A 5-dimension scoring framework gives you a diagnosis in 30 minutes.

Enterprise AI teams in 2026 face a counterintuitive reality: 72% of enterprise RAG deployments fail in the first year, and in 67% of those failures, the root cause is document data quality — not model selection, not infrastructure, not prompt engineering (Forrester, 2026, via ragaboutit.com). A June 2026 survey of 740 senior executives by Sinequa found that 38.4% identified “data that doesn’t get updated” as the primary cause of RAG pipeline failure — ahead of model limitations or retrieval architecture issues (Sinequa, “Beyond the Hype”, June 2026).

Most organizations respond to this statistic by improving their retrieval pipeline. Few respond by asking the prior question: is my corpus itself the problem?

These are different questions with different answers. The first targets the pipeline; the second targets the foundations. What’s missing for most enterprise teams is a structured way to answer the second question before committing to deployment — a rapid scoring framework that assesses document corpus readiness in thirty minutes, not six weeks.

The AI Document Quality Scorecard presented here is that tool. It does not replace a full corpus audit. It tells you whether you need one, and where to direct remediation efforts if you do.


The Hidden Corpus Problem: Why Most AI Teams Deploy Blind

Most enterprise AI projects open with model selection, retrieval framework design, and infrastructure planning. The document corpus — the actual body of knowledge that will feed the assistant or agent — is typically treated as a given. It exists, it will be connected, the AI will use it.

This assumption is almost never validated. According to an Informatica survey of 600 senior data leaders worldwide in early 2026, 61% of organizations rated their data quality insufficient to move AI pilots into production (Informatica CDO Insights 2026). Among those who had already deployed AI agents, 76% acknowledged that their data governance had not kept pace with adoption.

The issue is not that documents don’t exist. Every large organization has an extensive documentary heritage — SharePoint sites, Confluence spaces, legacy DMS repositories, network shares, regulatory archives. The problem is that nobody has measured the state of that heritage before pointing an AI at it.

The scorecard changes this. It translates a complex quality question — “is my corpus AI-ready?” — into a structured self-assessment that any Head of Knowledge Management or CDO can complete in an afternoon.


The AI Document Quality Scorecard: Five Dimensions, Three Levels, Thirty Minutes

The framework scores five foundational dimensions of document corpus quality for RAG and agentic AI. Each dimension is rated from 0 to 3. Total score ranges from 0 to 15.

DimensionScore 0 — UnmeasuredScore 1 — PartialScore 2 — StructuredScore 3 — Managed
1. FreshnessNo validity dates on documents> 30% of documents have no review dateReview policy defined, < 30% undatedActive review cycles, automated alerts on expired documents
2. CoherenceInter-document contradictions not inventoriedKnown contradictions, unresolvedArbitration process in placeDetection + documented resolution loop
3. CompletenessMissing topics not identifiedPartial list of known gapsTopic gap map producedGaps detected automatically, creation process triggered
4. TraceabilitySource unknown on > 50% of documentsOwner known on < 50% of documentsOwner defined for all critical documentsFull lineage + modification history tracked
5. NormalizationHeterogeneous formats, no metadataPartial standards, incomplete metadataMetadata schema defined and appliedComplete metadata, structured extraction validated for retrieval

Interpreting the total score:

  • 12 – 15: AI-Ready ✅ — Corpus can support RAG or agentic deployment. Continuous monitoring recommended.
  • 7 – 11: AI-Conditional ⚠️ — Deployment possible with identified risks. Focus remediation on the lowest 1-2 dimensions before full-scale production.
  • 0 – 6: AI-Unfit ❌ — A full corpus audit is required before any deployment. Without intervention, you are statistically in the 72% failure category.

Freshness and Coherence: The Two Dimensions That Cost Most in Production

Freshness is the dimension most directly correlated with “extractive hallucinations” — the failure mode where the model does not fabricate, but faithfully reproduces incorrect content from the retrieved corpus. A 2021 procedure still indexed alongside its 2024 revision gives your RAG an impossible choice: retrieve one or the other based on the vector proximity of the moment, without knowing which one applies.

The scoring question is direct: what percentage of your critical documents has an explicit validity date and an active review cycle? If the answer is “less than half,” you are at Score 0 or 1 on this dimension. If you don’t know the answer, that is Score 0.

Coherence is subtler to detect — and more destructive at scale. Two documents can each be recent, well-written, and formally correct. If one sets a tolerance threshold at 5% and another at 8% for the same metric, your agent will respond differently depending on which document is retrieved. The pipeline’s faithfulness score will be excellent in both cases. The answer will be wrong in one of them — and you will not know which.

We explored this specific failure mode — inter-source contradiction as invisible hallucination — in our article RAG doesn’t solve hallucination — it displaces it.


Completeness, Traceability, and Normalization: The Trio That Determines Retrieval Precision

Completeness is often the most surprising dimension for teams assessing their corpus for the first time. The issue is not that documents are bad — it’s that certain topics important to users are simply absent. An AI agent that cannot find an answer in its corpus has two possible behaviors: refusing to respond (honest but frustrating) or constructing an answer through inference (hallucinating). Neither outcome serves production users.

The measurement is pragmatic: compare actual user queries from your pilot against the topics covered in your corpus. If more than 20% of pilot questions have no corresponding source document, you are at Score 1 on completeness.

Traceability goes beyond document quality in the strict sense — it is becoming a regulatory requirement. EU AI Act Article 12, in force since August 2, 2026, requires the ability to trace the documentary sources that contributed to an AI decision or response in classified systems. Without an identified owner for each document, without a modification history, that traceability cannot be produced. We detailed the practical implications of Article 12 for corpus documentation in our analysis of EU AI Act deployer obligations.

Normalization is the most technical dimension — and the most consistently underestimated until deployment. A corpus composed of scanned PDFs without OCR, Word documents with inconsistent heading structures, Confluence pages without metadata: each of these problems degrades the precision of semantic retrieval. The embedding model indexes what it finds. If what it finds is poorly structured, retrieved chunks are imprecise — and answers reflect that imprecision.


Interpreting Your Score and Deciding What Comes Next

The total score drives three distinct types of decision.

Score 12 – 15 (AI-Ready): Your corpus is in sufficient condition for a structured RAG deployment. The recommended next step is continuous monitoring — to detect quality degradation over time as new documents are added, policies change, and regulatory requirements evolve. A corpus that scores AI-Ready in June can drift to AI-Conditional by December without a detection mechanism.

Score 7 – 11 (AI-Conditional): Identify the two dimensions with the lowest scores and concentrate remediation efforts there before expanding to full production. A targeted 4-to-8-week program on the critical dimensions is generally sufficient to reach AI-Ready status. You do not need to rebuild the entire corpus — you need to operate in the right places.

Score 0 – 6 (AI-Unfit): A RAG deployment on this corpus will produce disappointing results, with hallucination and inaccuracy rates that risk discrediting the entire AI initiative. The recommended step is a full corpus audit, which maps every anomaly, contradiction, duplicate, and gap with precision — before proposing a prioritized remediation plan. Our six-axis audit methodology is documented in our reference article on corpus auditing for AI.

One point holds regardless of where you score: the scorecard is a snapshot. Scores evolve — downward as corpus quality degrades, upward as remediation actions take effect. Without a monitoring mechanism, you lose the visibility that made the diagnosis meaningful in the first place.


From One-Time Scoring to Continuous Document Observability

The manual scorecard provides an initial diagnosis. It does not solve the underlying problem: document quality is a flow, not a state. New documents are created, procedures are updated, regulations change. A conflict between two documents can appear tomorrow without anyone noticing — until the agent begins producing inconsistent responses.

A Document Knowledge Platform (DKP) automates what the scorecard does manually. It continuously monitors document freshness, detects new inter-document contradictions as the corpus evolves, identifies emerging topics not yet covered, and surfaces remediation recommendations rather than simply signaling problems. This is what “Start Clean, Stay Clean” means operationally: the scorecard tells you where you are; the DKP keeps you there.

On a single document repository during an initial diagnostic, K-AI teams typically identify several hundred anomalies — distributed across expired documents not flagged as such, inter-version contradictions, divergent duplicates, and expected topics not covered. This volume exceeds what any team can manage manually with sufficient frequency to maintain AI-Ready status.

K-AI already works with CMA CGM, Veolia, PwC, BNP Paribas, TotalEnergies, and CEVA Logistics. Partners: AWS, Snowflake, Microsoft, Wavestone, Devoteam.


Frequently Asked Questions

How to assess document quality before AI deployment?

Use the five-dimension scorecard in this article (Freshness, Coherence, Completeness, Traceability, Normalization). Score yourself honestly from 0 to 3 on each dimension. A total score of 12 to 15 indicates an AI-Ready corpus. Below 7, a full audit is required before any RAG deployment. This initial diagnosis takes approximately thirty minutes and orients decisions without requiring external tools.

What makes a document AI-ready for enterprise RAG systems?

Five structural criteria: (1) Freshness — the document is current and its validity is known; (2) Coherence — it does not contradict other documents in the same corpus on the same topics; (3) Completeness — the topic it addresses is covered without gaps; (4) Traceability — its owner is identified and the modification history is available; (5) Normalization — its format, metadata, and structure enable precise extraction by the retrieval engine. A document can be well-written and fail on three of these five criteria.

What is the difference between this scorecard and a full corpus audit?

The scorecard is a rapid self-assessment tool (30 minutes) designed to determine whether a full audit is necessary and where to focus effort. A full corpus audit goes much further: it identifies each contradictory document, each divergent duplicate, each missing topic, produces a precise anomaly map, and delivers a prioritized remediation plan. The scorecard gives direction; the audit provides the complete map.

How does document quality affect RAG hallucination rates?

According to Forrester (2026), 67% of enterprise RAG failures are related to document data quality. Most of these failures are extractive: the model does not fabricate — it faithfully reproduces incorrect content from the corpus, including outdated procedures, superseded versions, or information contradicted by another document. These hallucinations are undetectable by standard evaluation frameworks (RAGAS, DeepEval) that measure pipeline faithfulness to retrieved context, not the quality of that context itself.

How long does it take to improve a document quality score?

It depends on the initial score and the dimensions that are lowest. A corpus in AI-Conditional range (score 7-11) can typically reach AI-Ready status within 4 to 8 weeks with targeted actions on the critical 1 to 2 dimensions — without rebuilding the entire corpus. A corpus in AI-Unfit range (score 0-6) requires a full prior audit before a realistic remediation timeline can be estimated.


Learn More

Ready to move beyond self-assessment to a full corpus audit? Contact the K-AI team at contact@k-ai.ai.



Sources

  1. Forrester (2026), via ragaboutit.com — “Why 72% of Enterprise RAG Implementations Fail in the First Year — and How to Avoid the Same Fate”: https://ragaboutit.com/why-72-of-enterprise-rag-implementations-fail-in-the-first-year-and-how-to-avoid-the-same-fate/
  2. Sinequa — “Beyond the Hype: The Reality of Enterprise Agentic AI in 2026” (June 2, 2026): https://www.sinequa.com/resources/blog/beyond-the-hype-the-reality-of-enterprise-agentic-ai-in-2026/
  3. Informatica — “CDO Insights 2026: AI Adoption Accelerates but Trust and Governance Lag Behind” (January 2026): https://www.informatica.com/blogs/cdo-insights-2026-ai-adoption-accelerates-but-trust-and-governance-lag-behind.html
  4. EU AI Act — Article 12, artificialintelligenceact.eu: https://artificialintelligenceact.eu/article/12/

And in your organization, what does your document estate look like?

30 minutes with a founder. We audit a sample of your documents for free and show you exactly what K-AI detects.

Book a demo → Read other articles