Agentic AI Without Document Foundations: Why 64% of Enterprises Are Building on Sand
Hyland GA's its Context Engine, Semarchy quantifies the MDM gap. No one names the upstream layer: the document corpus.
On March 9, 2026, Semarchy and Censuswide published a survey of 1,000 C-suite executives across the UK, US and France (Semarchy press release, March 9, 2026). The headline was sober — “Data management overtakes cost and talent as top AI challenge” — and it deserved a place in every boardroom briefing. A month later, France’s IT Social condensed the same data into a viral title: “Agentic AI: 64% of companies are deploying it without MDM foundations.” The number is shocking. It is also accurate, under a caveat that few cite. And it is, from where we sit at K-AI, both useful and incomplete.
Useful, because it finally names the structural debt no COMEX demo prepared anyone for. Incomplete, because in nearly every large enterprise we audit, the MDM the survey is talking about covers only a minority slice of the context that agents are expected to reason over. The bulk of that context lives elsewhere — in PDFs, Word documents, Confluence pages, SharePoint drives, shared folders, and email archives. It is that “elsewhere” I want to put a name on.
The number making the rounds — and what it actually measures
Let’s go back to the source. Semarchy and Censuswide surveyed 1,000 C-suite executives at companies with annual revenue over $200M between January 29 and February 9, 2026. Three figures anchor the argument: 51% of leaders are running AI initiatives without Master Data Management (MDM) foundations; 38% have no data quality standards in place; 65% say they are actively pushing agentic data management capabilities this year. The arithmetic follows: if two thirds of the panel is accelerating on agentic AI while half haven’t laid the MDM groundwork, the majority of agentic projects are necessarily built on partial foundations. The 64% relayed by IT Social is — as best as I can read it — that same number sliced against the agentic-active subset. The exact methodology isn’t published; I encourage anyone interested to read both pieces in parallel. The order of magnitude holds.
What I take from it, and what I see weekly in client engagements, is a threshold effect. At 51% across the full panel, we’re talking about a statistical weakness. At 64% among active agentic deployments, we’re talking about a defining trait of the era. And that trait is landing at the worst possible moment: between Hyland’s June 1 general availability of its Enterprise Context Engine and Enterprise Agent Mesh, Microsoft’s May 1 generalization of Agent 365, and Camunda’s late-May launch of ProcessOS, the agentic execution layer is consolidating faster than the layer that’s supposed to feed it reliable knowledge.
Why MDM is necessary, and not sufficient
Within its perimeter, MDM does its job. Semarchy, Informatica, Profisee and a handful of others normalize customer, supplier, product, financial and material master records. These are valuable assets. They are not the raw material of enterprise AI agents.
Gartner has estimated, in its 2025-2026 syntheses, that 70 to 90% of the data in a large enterprise is unstructured — documents, emails, meeting notes, presentations, contracts, technical manuals. Atlan documents 40 to 60% annual growth in unstructured data across its customer base (Atlan, 2026). When an enterprise AI agent reasons over a business question — the applicable HR policy, a vendor contract clause, a refund procedure, the history of a clinical case, the compliance status of a trial protocol — it pulls from that zone. MDM does not venture there. It never claimed to.
The Semarchy headline, therefore, does not actually say that 51% of enterprises lack data foundations. It says that 51% lack foundations on the portion of their data MDM covers. On the remaining portion — which represents the bulk of the volume and feeds the majority of agentic decisions — the share without foundations is, in my experience auditing such corpora, materially higher. That’s the portion I want to name now.
AI-ready data ≠ AI-ready documents — the missed semantic shift
The market has settled on AI-ready data. Semarchy, Collibra, Atlan, Alation and Informatica all defend the term, each with its own tools, all on the structured and semi-structured perimeter. The conversation is mature and useful. But by 2026 it leaves a blind spot.
The blind spot is AI-ready documents. A 47-page PDF that interleaves a policy in force, two obsolete annexes and an unsigned addendum is not AI-ready. A SharePoint site containing seven versions of a refund procedure — two of which contradict the other five — is not AI-ready. A Confluence page from 2021 citing a regulatory threshold that was revised in 2024 without the page being updated is not AI-ready. None of these pathologies are captured by MDM. None are captured by a generalist data catalog either — Atlan, Alation and Collibra are starting to extend into unstructured data (see Collibra’s Making unstructured data AI-ready) — but in inventory mode, not audit mode.
It is that measurement — anomalies, conflicts, divergent duplicates, unmarked obsolescence, traceability, freshness — that determines whether a document is, or is not, AI-ready. We documented it on May 15 in the K-AI six-axis method; I won’t restate it here. What I want to add is that the missing pillar in the five dominant AI Readiness frameworks of 2026 — Gartner, Cisco, Microsoft, Cloudera and Iris.ai — which we mapped on May 25, is exactly that pillar: the document corpus. And that the AI agent, by design, invokes it more intensely than any prior AI system, because it chains reasoning steps and pulls from multiple documents per decision.
The race for agentic foundations — Hyland, Squirro, Glean, Writer
Four weeks of May 2026 saw the race for agentic foundations crystallize. On May 12, Glean published its Enterprise Agent Development Lifecycle, a seven-stage framework codifying how to build, govern and measure AI agents. On May 20, Squirro shipped its LTS release with entity-based filtering and zero-trust governance (Squirro, May 2026). On May 28, Glean crossed $300M ARR with an explicit pitch — cut AI bills by grounding agents better (TechCrunch, May 28, 2026). On June 1, Hyland announced the general availability of its Enterprise Context Engine and Enterprise Agent Mesh. Jitesh S. Ghai, Hyland’s CEO, put it plainly: “the winners will be the enterprises that can embed AI into their operations with governance and control.”
Every one of these players addresses a real problem — agent governance in production, observability, lifecycle, control. None addresses the question that precedes them: what makes a document corpus ready to serve as the upstream of an agentic mesh? Hyland offers industry-specific ontologies to enrich context; it does not audit the corpus at the input. Squirro builds a knowledge graph and a chain of custody; it does not measure document health upstream of the graph. Glean opens an agent lifecycle; it does not inspect the quality of the documents those agents will retrieve. Writer quantifies the adoption gap (Writer, 2026) — 79% of organizations face obstacles, 60% operate agents in production without formal governance — but on the brand governance front, not the corpus.
This is the gap K-AI occupies — not opportunistically, but because it is the operation without which none of the upper tiers hold. Corpus audit, document AI-readiness scoring, continuous monitoring (Start Clean, Stay Clean), a downstream semantic graph that is only ever fed by documents under control. Where our ecosystem partners — Glean, Hyland, Sinequa, Microsoft, AWS, Snowflake — orchestrate and activate, K-AI prepares and watches the raw material.
What K-AI calls a DKP — the upstream layer, not a substitute for MDM
Document Knowledge Platform. It is the category we’ve been defending for three years, and that we clarified on May 18 against two persistent confusions. A DKP is not Knowledge Management — that is an organizational discipline. It is not Knowledge AI either — that is the consumption layer, where Glean, Writer, Sana and Squirro live. It is what a Data Catalog would be — for unstructured documents. It is what a Data Mesh would be — for the document corpus.
In practice, that means three things. First, a DKP audits before an agent query is ever issued — it tells you, document by document and repository by repository, whether there are anomalies, conflicts, divergent duplicates, unmarked obsolescence, traceability gaps, freshness decay. Second, it operates continuously — not a one-shot audit but ongoing monitoring that detects drift and surfaces it to business owners before the agent hits it. Third, it does not replace MDM, the knowledge graph, or the copilot’s context engine — it precedes them and delivers them clean, sourced, dated and arbitrated material.
MDM remains useful. But a 2026 data strategy that stops at MDM is a strategy that ignores three quarters of the context its AI agents will draw on.
Three concrete actions for a CTO or CDO in June 2026
Before the end of the quarter — and as the EU AI Act’s August 2, 2026 deadline introduces the first documentation traceability obligations we framed on June 1 — three actions stand out.
First, map the document repositories your organization’s agentic pilots already query — SharePoint, Confluence, drives, line-of-business document management systems, archived mailboxes. For each, measure actual MDM coverage (usually zero) and the share of agentic decisions that depend on it (usually the majority).
Second, launch a corpus audit on your most strategic repository, against the six-axis grid. As a working order of magnitude, an initial diagnostic on a single document repository typically surfaces several hundred to several thousand anomalies, divergent duplicates and obsolete passages — the exact number depends on perimeter, document maturity and sector, and is meant to direct remediation work, not serve as a marketing line.
Third, separate the budgets. The agentic activation layer (Glean, Hyland, Microsoft, Writer, Sinequa) and the document foundations layer (DKP) are two distinct investments. One does not substitute for the other. Conflating them is the surest way to end up, eighteen months from now, restating the March 9 verdict from Craig Gravina, Semarchy’s CTO: “We are seeing a stark divide — one half of leaders building on strong foundations, the other half actively accumulating AI technical debt.” On the unstructured portion, which carries most of the weight, that stark divide has not yet happened. It will happen in 2027. Preparing for it now is what will separate the organizations that scale agentic AI from those that have to suspend it.
Frequently asked questions
Why are 64% of enterprises deploying AI agents without MDM foundations?
The figure, relayed in France by IT Social, derives from the Semarchy × Censuswide survey of March 2026 covering 1,000 C-suite executives. The primary source reports 51% of enterprises without MDM and 65% pushing agentic capabilities this year; the 64% corresponds, by recoupment, to the share of enterprises pushing agentic AI while lacking MDM foundations. Two factors explain the gap: COMEX pressure to industrialize agentic AI in 2025-2026 ran ahead of the time needed for data foundation work to mature, and MDM, perceived as infrastructure, is typically budgeted on an 18-24 month horizon while AI agents are budgeted on 6-9. That tempo gap opened in 2025; it is closing in 2026 under the pressure of production incidents and regulatory deadlines.
What’s the difference between AI-ready data and AI-ready documents?
AI-ready data typically refers to the quality, governance and availability of an enterprise’s structured and semi-structured data — customer, product, transaction and operational metric repositories. AI-ready documents refers to document quality in the strict sense — absence of internal anomalies, absence of cross-document conflicts, management of divergent duplicates, obsolescence marking, traceability, freshness by segment. These two families do not share tools, owners, or audit methods. The majority of enterprise AI agents reason 70-90% on the second category, while budgets and public AI readiness frameworks treat almost exclusively the first.
Is MDM enough to prepare an enterprise for agentic AI, or do you also need a DKP?
MDM is necessary for the structured portion of agentic context — customer, vendor, product and account masters. It is insufficient for the unstructured portion — documents, emails, meeting notes — which represents 70 to 90% of what an agent actually reasons over. A DKP (Document Knowledge Platform) operates on that second portion: corpus audit, document AI-readiness scoring, continuous monitoring. The two investments are complementary and do not substitute for one another. An enterprise with only MDM will see its agents hallucinate on anything outside the master perimeter; an enterprise with only a DKP will see its agents lose the thread as soon as they need to cross a document with a customer record. Both are required.
What are the main governance challenges for deploying agentic AI at scale?
Three challenges dominate in 2026. First, auditability: every agent decision must be traceable back to the documents and data that supported it, which requires retrieval logs that few organizations maintain. Second, upstream corpus governance: without continuous document audit, agents silently propagate the conflicts, obsolescences and divergent duplicates living in the production corpus. Third, separation of concerns: the activation layer (orchestration, observability, agent control — Hyland Agent Mesh, Glean ADLC, Writer guardrails) and the document foundations layer (DKP) are two distinct disciplines that no single vendor fully covers yet.
What’s the difference between a knowledge base and a Document Knowledge Platform?
A knowledge base (Confluence, Notion, SharePoint, line-of-business DMS) is a place to store and share documents — its role is to serve the human who searches, classifies and updates. A Document Knowledge Platform is an audit, scoring and monitoring layer that operates on top of existing knowledge bases — its role is to measure document health across the corpus, surface drift, and make that corpus exploitable by AI systems. A knowledge base targets human collaboration; a DKP targets algorithmic reliability. The two coexist — you don’t buy a DKP to replace your Confluence, you buy one to audit your Confluence (and everything else).
To go further
If you are industrializing an agentic AI project and want to quantify the document state of your most strategic repository before going further, write to us at contact@k-ai.ai. Our six-axis audit method is our standard entry point; the deliverable is an operational report for a CTO, CDO or Head of Knowledge Management — not a marketing deck.
Sources cited
- Semarchy × Censuswide, “Data Management Overtakes Cost and Talent as Top AI Challenge” — press release, March 9, 2026: https://semarchy.com/press-releases/data-management-top-ai-challenge-agentic-enterprises/
- IT Social, “IA agentique : 64% des entreprises la déploient sans fondations MDM” — April 2026: https://itsocial.fr/intelligence-artificielle/intelligence-artificielle-articles/ia-agentique-64-des-entreprises-la-deploient-sans-fondations-mdm/
- Hyland, “Hyland launches next wave of AI platform innovations to unlock the content-powered agentic enterprise” — June 1, 2026: https://www.hyland.com/en/company/newsroom/hyland-launches-next-wave-ai-platform-innovations
- Glean, “Enterprise Agent Development Lifecycle” — May 12, 2026: https://www.glean.com/blog/agent-dev-lifecycle-2026
- TechCrunch, “Glean’s top line crosses $300M as AI budget cutting becomes its major selling point” — May 28, 2026: https://techcrunch.com/2026/05/28/gleans-top-line-crosses-300m-as-ai-budget-cutting-becomes-its-major-selling-point/
- Squirro, “New Release May 2026: Zero-Trust Governance” — May 20, 2026: https://squirro.com/news-and-events/new-release-may-2026-advances-enterprise-ai-accuracy-and-zero-trust-governance
- Writer, “Enterprise AI adoption in 2026: Why 79% face challenges” — 2026: https://writer.com/blog/enterprise-ai-adoption-2026/
- Collibra, “Making unstructured data AI-ready” — 2026: https://www.collibra.com/blog/making-unstructured-data-ai-ready-unlocking-value-for-genai-and-agents
- Atlan, “Active Metadata Management — Complete 2026 AI Guide” — 2026: https://atlan.com/active-metadata-101/
- Microsoft, “Microsoft Agent 365 now generally available” — May 1, 2026: https://www.microsoft.com/en-us/security/blog/2026/05/01/microsoft-agent-365-now-generally-available-expands-capabilities-and-integrations/
- Camunda, “ProcessOS — agentic operating system” — May 20, 2026: https://www.businesswire.com/news/home/20260520352437/en/Camunda-announces-ProcessOS-an-agentic-operating-system-for-AI-first-enterprise-transformation
Related reading
- AI Readiness Assessment 2026: What Gartner, Cisco, Microsoft and Cloudera Measure — and the “Corpus” Pillar All These Frameworks Forget (May 25, 2026)
- Auditing a Document Corpus for AI — the K-AI Six-Axis Method (May 15, 2026)
- Knowledge AI, Knowledge Management, Document Knowledge Platform: Untangling the Three Categories Before Sabotaging Your Enterprise AI Project (May 18, 2026)
K-AI already partners with CMA CGM, Veolia, PwC, BNP Paribas, TotalEnergies and CEVA Logistics. Partners: AWS, Snowflake, Microsoft, Wavestone, Devoteam.
