In March 2026, Roland Berger published "Profitless prosperity in AI" — its study of the gap it calls the AI value gap, drawn from a survey of 203 senior executives with technology mandates across Europe, the US and Japan.1 Its finding names the defining enterprise-AI problem of the moment: it is no longer whether AI works — pilots ship, demos impress, adoption climbs. It is that almost 90% of firms report returns lagging their AI spending.1 The activity is real; the return is missing. The report's name for that state is "profitless prosperity."
The temptation is to read that as a model problem — pick a better model, write a better prompt. The evidence says otherwise. The gap is an operating-model problem, and at its centre is a failure most teams don't even register as one: they cannot actually measure whether a given AI initiative is creating value. Roland Berger is blunt about the consequence:
Read that twice, because it inverts how most organizations run a programme. A "go-live" is easy to see and easy to celebrate. Realized value is slow, diffuse and hard to attribute — so it quietly stops being measured. With a deterministic system you could assume the value once and move on. With an autonomous agent you cannot: you have to measure the value continuously, per initiative, against a baseline — or you genuinely do not know whether the thing is earning its keep or quietly costing you money.
The diagnosisThree gaps the data keeps surfacing
Roland Berger's analysis resolves the value gap into a few concrete, measurable shapes — not vibes, but timing and conversion failures you can compute at the portfolio level:
Notice what every one of these is: a property of the measurement and operating model, not the model weights. The constraint is whether the organization can prove value and control risk over time — and the corroborating evidence from the wider market points the same way. Gartner predicts over 40% of agentic-AI projects will be cancelled by the end of 2027, citing "escalating costs, unclear business value or inadequate risk controls"2 — three value-and-governance failures, not a single model failure among them. An MIT study found roughly 95% of organizations getting "zero return" on enterprise GenAI, with only about 5% of integrated pilots extracting real value3 (a small, self-selected, contested sample — we cite the precise framing, not the folk version "95% of pilots fail"). And Deloitte reports more than two-thirds of enterprises expect 30% or fewer of their GenAI experiments to scale, with regulatory and compliance concern the top barrier.4
The trapKPI sprawl and the one-time assessment
Why is value so under-measured when everyone agrees it matters? Two mechanics, both visible in the data. First, KPI sprawl: teams measure everything, so they optimize nothing. A dashboard with forty metrics and no priority is indistinguishable from no dashboard. Second — and this is the one that quietly does the damage — most measurement is a one-time event. With only about one firm in four tracking AI returns automatically and continuously, the rest are working from a single snapshot or a gut feel.1
A one-time assessment is structurally blind to exactly the failure modes that define the value gap. It cannot see an initiative that shipped on schedule and then slipped below its baseline three months later. It cannot see the breakeven date sliding to the right. It cannot tell an initiative that is narrowly, genuinely succeeding apart from one that is burning budget while everyone assumes it's fine. Value is a time series, and a single snapshot throws the time axis away.
The report's own self-assessment tool is, tellingly, a one-shot survey. The fix is not a better survey. It is to make measurement a cadence — to re-score the whole portfolio on a fixed interval, against the same baseline, so that slip, breakeven drift and conversion gaps surface the week they appear rather than at the next annual review.
The mapFour archetypes — and why most firms are in the wrong one
Roland Berger plots firms on two axes — strategic ambition and execution capability — and the resulting quadrants are a useful diagnostic for any portfolio:
- Industrializers (high strategy, high execution) actually monetize AI — they wire it deep into the operating model and measure it continuously.
- The Stalled (high strategy, low execution) move fast but capture laggard returns — ambition outruns the ability to convert it.
- Observers — the largest group — sit low on strategy and stuck in pilots, carrying the worst governance, integration and shadow-AI risk.
- Specialists (low strategy, high execution) win narrowly but really — by-design focus, not failure.
The point of the map is not the labels; it is that an initiative's archetype changes over time, and you only catch the dangerous transitions — an Industrializer sliding toward Stalled, an Observer never leaving the pilot quadrant — if you re-score continuously. A static assessment freezes every firm in the quadrant it happened to occupy on survey day.
The synthesisMake the verdict something you can't game
There is one more trap, and it is the subtle one. If "is this initiative creating value?" is answered by a model reading the initiative owner's own free-text description, the answer can be talked into looking healthy. A self-serving narrative — or a deliberately crafted one — can steer a model toward a flattering verdict while the actual numbers decline. Untrusted text feeding the scoring step is not a hypothetical; OWASP ranks prompt injection as the #1 risk for LLM applications, including content that need not be human-visible as long as a model parses it.5
So the verdict has to be anchored on evidence the owner cannot freely author — the actual KPI series (current vs baseline vs target) and a handful of structural enums — with any model used only for a bounded suggestion and a rationale, never for the decision itself. That is a platform stance, not a prompt. A handful of non-negotiables make it real:
And it runs self-hosted — on-premise, private cloud or air-gapped — so the financials, KPIs and initiative records being scored never cross a boundary you don't own. Cisco's 2025 benchmark found 90% of organizations believe local storage of data is inherently safer;6 for portfolio-level financial data, where it runs is not a preference.
flow8 in practiceA living value loop, not a one-time survey
We built the report's prescription as a concrete flow8 flow — an AI-value-gap tracker that turns the one-shot assessment into a weekly, governed loop. On a cadence it reads every AI initiative, scores it on the archetype matrix from the actual KPI series, and writes a scorecard to a shared value bus — a scorecards ledger — that rolls up to a single, human-reviewed dashboard. The architecture, not the prose:
scorecards ledger; every output prepares and recommends — nothing acts.scorecards
The takeawayMeasure value like you mean it
Roland Berger is right that the value gap, not model quality, is now the defining problem — and right that it is an operating-model failure. The organizations that close it will be the ones that stopped treating measurement as a launch-day event: that re-scored every initiative on a cadence against a fixed baseline, anchored the verdict on numbers an owner can't talk around, treated every input as untrusted until proven otherwise, kept a human on every consequential decision, and ran all of it on infrastructure they own. Get that living value loop right, and "profitless prosperity" becomes a state you can see — and exit. Skip it, and you will keep shipping AI that works and never quite pays.
Stop measuring AI value once a year.
flow8 turns the one-time AI assessment into a continuous, governed value loop — every initiative re-scored against a baseline, on a verdict that can't be gamed, with a human on every decision, on infrastructure you own.
Talk to our team →Sources
- Roland Berger, "Profitless prosperity in AI" (the AI value gap), March 4, 2026 — survey of 203 senior executives across Europe, the US and Japan. rolandberger.com
- Gartner, "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027," press release, June 25, 2025. gartner.com
- MIT NANDA, "The GenAI Divide: State of AI in Business 2025," July 2025; coverage via Fortune, Aug 18, 2025. fortune.com
- Deloitte, "State of Generative AI in the Enterprise," Wave 4, Jan 21, 2025. deloitte.com
- OWASP, "LLM01:2025 Prompt Injection," OWASP Top 10 for LLM Applications 2025. genai.owasp.org
- Cisco, "2025 Data Privacy Benchmark Study," Apr 2, 2025. newsroom.cisco.com
- EU AI Act — Article 14 (Human Oversight), Article 12 (Record-keeping). artificialintelligenceact.eu/article/14 · article/12
- OWASP GenAI Security Project, "Top 10 Risks & Mitigations for Agentic AI," Dec 9, 2025. genai.owasp.org
- NIST, "Artificial Intelligence Risk Management Framework (AI RMF 1.0)," NIST AI 100-1, Jan 2023. nist.gov