The governed enterprise: what it takes to let AI run the ERP

25–35%

of large tech programs hit their targeted EBITDA & cash-flow impact¹

40%+

of agentic-AI projects will be cancelled by 2027 — Gartner²

LLM01

prompt injection — OWASP's #1 risk for LLM applications³

In May 2026, McKinsey published "The end of ERP as we know it? Five ways AI is disrupting ERP."¹ Its thesis is that AI is restructuring the systems that run finance, supply chain and operations: instead of people navigating screens, networks of autonomous agents act on top of the system — what McKinsey calls a "headless, agentic" architecture. The upside it cites is real and large: AI agents "have the potential to reduce the effort needed to implement ERP systems by at least 50 percent and cut down program duration by half," and early adopters report "EBIT improvements of 5 percent or more."¹

That is the headline. But the article's first and most important shift is not about speed at all. McKinsey names it "value mission control," and states the principle plainly:

"Measuring value becomes a core architectural capability. Autonomous decisions require continuous impact assessment."— McKinsey, "The end of ERP as we know it?", May 2026

Read that twice, because it inverts how most organizations think about automation. With a deterministic process you can assume the value once and forget it. With an autonomous agent you cannot — you have to measure the value continuously, per agent, against a baseline, or you genuinely do not know whether the thing is helping or quietly costing you money. The rest of this piece is about why that is hard, what the evidence says happens when you skip it, and what it takes to do it right.

The evidenceWhy "it works" is not the same as "it creates value"

The uncomfortable part of McKinsey's own analysis is the base rate. In the same article it reports that "just 25 to 35 percent of large tech programs achieve their targeted EBITDA and cash-flow impact, while 65 to 80 percent exceed their planned budget or timeline."¹ Adding autonomy to a system does not automatically move those numbers — and the independent evidence on AI specifically is sobering:

40%+Gartner, 2025

Gartner predicts over 40% of agentic-AI projects will be cancelled by the end of 2027, citing "escalating costs, unclear business value or inadequate risk controls."² Those three causes are precisely value-measurement and governance failures — not model failures.

~5%MIT NANDA, 2025

An MIT NANDA study found roughly 95% of organizations were getting "zero return" on enterprise GenAI, with only about 5% of integrated pilots extracting real value.⁴ (A small, self-selected, non-peer-reviewed sample — and publicly contested — so we cite the precise framing, not the folk version "95% of pilots fail.")

⅔+Deloitte, 2025

More than two-thirds of enterprises expect 30% or fewer of their GenAI experiments to scale within three to six months; the top barrier to scaling is regulatory and compliance concern.⁵

The pattern is consistent across McKinsey, Gartner, MIT and Deloitte: the constraint is not whether the model can do the task. It is whether the organization can prove the value and control the risk. Those are properties of the system around the model — not the model itself.

The riskAn ERP agent acts on money and identity

This is where ERP raises the stakes above a chatbot. An ERP transaction is a payment released, a purchase order raised, a vendor master record changed, a credit limit adjusted. When an agent acts here, it acts on money and identity — and the security field has been explicit about what that requires.

OWASP's Top 10 for LLM Applications (2025) ranks prompt injection as LLM01 — the number-one risk, defined as user-supplied input that "alter[s] the LLM's behavior or output in unintended ways," including content that "need not be human-visible/readable, as long as the content is parsed by the model."³ In an ERP context, that "input" is a vendor note, a free-text memo field, a comment in custom code — any of which a malicious actor could craft to steer an agent. The lesson is blunt: untrusted input must be treated as data, never as instructions.

OWASP's companion risk, Excessive Agency (LLM06), addresses the other half directly. Its recommended mitigation is unambiguous:

"Utilise human-in-the-loop control to require a human to approve high-impact actions before they are taken."— OWASP Top 10 for LLM Applications, LLM06: Excessive Agency

This is not a flow8 opinion; it is the consensus of the application-security community, since formalized further in OWASP's dedicated Top 10 for Agentic Applications (December 2025), which introduces the principle of "least agency" — the agentic extension of least privilege.⁶ And for regulated operations it is no longer merely best practice. It is law.

The lawGovernance and audit are now obligations, not options

For high-risk uses — and the EU AI Act explicitly lists AI that evaluates "the creditworthiness of natural persons or establish[es] their credit score" as high-risk⁷ — two requirements land squarely on any agentic ERP deployment:

Human oversight (Article 14). High-risk systems must be designed so they "can be effectively overseen by natural persons," who can "disregard, override or reverse the output" and "interrupt the system through a 'stop' button" — while the design actively guards against automation bias.⁷
Record-keeping (Article 12). High-risk systems "shall technically allow for the automatic recording of events (logs) over the lifetime of the system" to ensure traceability.⁷ High-risk obligations apply from August 2026.

The U.S. NIST AI Risk Management Framework reaches the same destination from a different direction: its four core functions are Govern, Map, Measure, Manage, with Govern described as "a cross-cutting function that is infused throughout AI risk management."⁸ Govern first, then build. An audit trail and a human-on-every-high-consequence-action are not features you bolt on after a successful pilot — they are the conditions under which the pilot is allowed to exist.

There is a sovereignty dimension too. Cisco's 2025 Data Privacy Benchmark Study found 90% of organizations believe local storage of data is inherently safer, and 64% worry about inadvertently sharing sensitive information with AI systems.⁹ For the data inside an ERP — financials, customer master data, payroll — "where does it run" is not a preference. It decides whether you are permitted to use AI on the core at all.

The synthesisThis is a platform problem, not a model problem

Put the evidence together and a clear conclusion falls out. The model is not the hard part. The hard part is the system around it: continuous value measurement, treating every input as untrusted, keeping a human on every money-and-identity action, logging all of it immutably, and running it on infrastructure you control. Solve those once, as platform capabilities, and every new use case inherits them. Solve them per-pilot, and you get exactly the graveyard Gartner and MIT describe — impressive demos that never reach production because each one re-litigates security, audit and value from scratch.

That platform stance is the design principle behind flow8. A handful of non-negotiables apply to every automated process it runs, whether reconciling an invoice or triaging an AI use case:

🛑 Never auto-act on money or identity High-consequence actions fail safe to a draft plus a flag. The agent prepares and recommends; a named human approves — OWASP LLM06 and EU AI Act Article 14, operationalized.

🧪 Untrusted input is data, not instructions Every ingested text — vendor note, free-text field, code comment — is scanned for injection before any model can act on it. A direct answer to OWASP LLM01.

📋 State is the source of truth Every action has a stable key, written before the side-effect and confirmed after. Re-runs never double-act. Every step is logged, attributable and replayable — EU AI Act Article 12 by construction.

📈 Value is measured, not assumed Every agent writes its estimated impact to a shared value ledger, scored against a baseline. You can see, per agent, whether it earns its keep — McKinsey's "value mission control," made real.

And it runs self-hosted — on-premise, private cloud or air-gapped — so the data inside your ERP never crosses a boundary you don't own, answering the sovereignty concern Cisco quantifies.

flow8 in practiceThe five ERP theses, running as governed flows

We built each of McKinsey's five theses as a concrete flow8 flow. All five are producers writing into one value bus — an agent_actions ledger — so the whole program rolls up to a single, human-reviewed P&L. The architecture, not the prose:

Five self-hosted flows, one shared agent_actions ledger. Each prepares and recommends; nothing touches money or identity without a human.

📊 Value control per-agent P&L vs baseline cron · Sheets

🧮 Data sentinel AI-readiness of ERP tables REST/OData

🔀 Transform as-is ↔ to-be mapping BM25 + AI

🚦 Delivery EBITDA / budget / timeline RAG digest

⚖️ Pilot triage build / buy / kill / scale P&L × std-fit

Value bus · agent_actions scored vs baseline · injection pre-scan · idempotent · audit-logged

👤 Human-gated P&L rollup + alerts recommend → a person decides → execute

Self-hosted · no data egress 185+ audited modules Never auto-acts on money/identity Add flow #6 — same rails, no rework

The question is no longer "can AI do this?" It is "can we prove it created value, and stop it from doing harm?" Every credible source — McKinsey, Gartner, OWASP, NIST, the EU — agrees that is the real test. It is a platform answer, not a model answer.

The takeawayBuild the governed core once

McKinsey is right that AI changes ERP, and the speed is genuinely transformative. But every serious source — the analysts on value, the security community on injection and agency, the regulators on oversight and logging — converges on the same unglamorous half of the story. The organizations that capture the value will be the ones that measured impact continuously, treated every input as hostile until proven otherwise, kept a human on every money-and-identity decision, and logged all of it on infrastructure they own. Get that governed core right, and each new use case is a fast, safe addition. Get it wrong, and you have simply built a faster way to lose control of the systems your business runs on.

On the framing: the McKinsey statistics (program EBITDA impact, budget/timeline overruns, implementation-effort reduction, EBIT gains) and the "value mission control" concept are drawn directly from the May 2026 article.¹ The build-vs-buy, pilot-triage and standardization-vs-differentiation framings are not from that piece — they are supported by the separate sources cited below. flow8's account of how to operationalize these ideas in a governed, self-hosted way is our own.

Bring a trending use case in safely.

flow8 is the platform for running AI use cases in a standardized, secure, governed way — measured against a baseline, with a human on every high-consequence decision, on infrastructure you own.

Talk to our team →

Sources

McKinsey & Company, "The end of ERP as we know it? Five ways AI is disrupting ERP," McKinsey Technology, May 2026. mckinsey.com
Gartner, "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027," press release, June 25, 2025. gartner.com
OWASP, "LLM01:2025 Prompt Injection," OWASP Top 10 for LLM Applications 2025. genai.owasp.org
MIT NANDA, "The GenAI Divide: State of AI in Business 2025," July 2025; coverage via Fortune, Aug 18, 2025. fortune.com
Deloitte, "State of Generative AI in the Enterprise," Wave 4, Jan 21, 2025. deloitte.com
OWASP GenAI Security Project, "Top 10 Risks & Mitigations for Agentic AI," Dec 9, 2025. genai.owasp.org
EU AI Act — Article 14 (Human Oversight), Article 12 (Record-keeping), Annex III §5(b). artificialintelligenceact.eu/article/14 · article/12
NIST, "Artificial Intelligence Risk Management Framework (AI RMF 1.0)," NIST AI 100-1, Jan 2023. nist.gov
Cisco, "2025 Data Privacy Benchmark Study," Apr 2, 2025. newsroom.cisco.com
Informatica, "CDO Insights 2024" (42% of data leaders cite data quality as the main obstacle to GenAI adoption), Jan 31, 2024. informatica.com

← All insights