Apr 26, 2026 · AI

AI Fluency Isn't One Thing

Most maturity models can't tell you what to do about AI fluency, because they measure either where the org sits or what a person can do — almost never the bridge between them. A look at building that bridge, why I picked dimensions over levels, and the three tests every productive maturity framework should pass.

Diagram showing organization maturity models and individual fluency models separated by a missing operational bridge.

The thing nobody quite sells you

I spent a chunk of the last few months building an AI maturity model for a small group of small businesses. The lesson wasn't the model. The lesson was that almost every reputable maturity framework I picked up was answering a question I didn't have.

The frameworks measure one of two things. Either they measure where the organisation sits on its adoption arc — Gartner, Deloitte's nearly-550-leader study, IDC's n=1,534 benchmark, MIT CISR's stage model with quantified financial outcomes — or they measure what a person can do with AI: UNESCO's competency frame, Ewerlof's seven-level engineering ladder, role-based literacy ladders inside enterprises.

Most mainstream frameworks overweight one side of the problem. I couldn't find one that bridged organisational maturity and individual fluency in a way that was diagnostic for the use case I was designing for. That's not a literature-wide absence — MITRE, for example, explicitly addresses workforce alongside mission. It's a design problem. The bridge I needed wasn't on the shelf.

So this post is about the methodological act of building one. Not the framework's content. The bridge.

The maturity-fluency gap, as a design problem

Org models tell you you're at "Operationalising" or "Scaling." Useful for a board slide. Doesn't tell anyone on a Tuesday what to actually do differently. Individual models tell you a person is "Assisted" or "Integrated" or "Agentic." Useful for L&D. Doesn't tell you anything about whether the org around them can absorb what they've learned.

In my experience, the two systems condition each other. People who can specify and redesign work need an org that lets them. An org that's invested in tooling needs people who can think above the prompt. Treat them as separate problems and you get the worst version of both — fluent people in unchanged orgs, or governance scaffolding around populations who use AI as a faster typewriter.

That observation pushed me into building something rather than adopting an off-the-shelf model.

A Monday-morning audit you can run before reading the rest of this

Before going further, here's a small thing worth trying. Pick a maturity model your team has adopted, or is about to. Ask it five questions:

Does it bridge org maturity and individual fluency, or pick one? If it picks one, what's your plan for the other side?
Is the shape right — single ladder, or multiple dimensions? Ladders are simpler to communicate. Dimensions surface interference.
What does it actually tell you to do at the stage 2 → 3 climb? This is where most teams die. If the guidance is generic here, the model isn't doing the job.
Which parts of it are temporary scaffolding, and is that named? Models that pretend every part is permanent calcify around their weakest decisions.
Does it generate movement, give real guidance, and let you measure whether the motion was real? More on these three tests at the end.

If the model can't answer those, you've got a slide, not a tool. The rest of this post is about how I tried to build one that could.

The four-dimension call (and why it isn't four levels)

The first decision was shape. Most maturity models default to a single ladder — five levels, sometimes four, you climb them one at a time. CMMI is the structural ancestor. Almost every modern AI maturity model is a CMMI cousin.

A single ladder is the wrong shape for what I was trying to describe.

What I ended up with is four dimensions, not four levels:

AI Fluency — what individual people can do with AI. The workforce capability layer.
Thinking Altitude — the level of abstraction someone can productively reason at when working with AI. More on this below; it's the dimension I think the field is mostly missing.
AI Excellence — the org's actual operating quality with AI. Process redesign depth, evaluation discipline, governance.
Adaptive Operating Model — the structural redesign of how work, decision rights, and human-agent teams are arranged. Where the org changes shape.

The reason it's four dimensions and not four levels is that a team can be high on one and low on another, and the gap is diagnostic. Fluency-high and Excellence-low means capable people doing AI work that doesn't translate into reliable outcomes. Excellence-medium and Adaptive Operating Model-zero means governance and tooling without any actual redesign of how work flows. Each pairing is a different problem with a different next move.

McKinsey's 2025 State of AI is direct on this: workflow redesign had the biggest effect on EBIT impact among 25 attributes they tested, and only 21% of respondents said they'd fundamentally redesigned at least some workflows. A single-ladder model collapses redesign into a generic stage label. You lose the diagnostic value.

The first methodological call: prefer dimensions to levels when the dimensions are independently actionable. A single ladder is easier to explain to a board, but it gives you nowhere useful to point on a Monday.

Four dimensions diagram showing AI Fluency, Thinking Altitude, AI Excellence, and Adaptive Operating Model as independently moving diagnostic axes.

Thinking Altitude, with one worked example

If I had to pick one piece of the model to defend, it'd be Thinking Altitude. So let me be concrete about it, because abstract definitions don't survive sceptics.

Take the same work item — say, "produce a weekly customer-feedback summary" — and look at it at three altitudes:

A0 — task execution. A person opens a chat, pastes the week's tickets, asks for a summary, copies the result into a doc. Repeat next week. The AI does the work. The person owns each instance.
A1 — workflow design. A person specifies what a good weekly summary looks like — input shape, sections, length, tone, what counts as a real signal versus noise — and builds a workflow that produces it on a schedule. The AI does the work. The person owns the spec.
A2 — system / harness design. A person specifies the kind of summary workflow the team needs across customer feedback, sales calls, product telemetry, and incident reviews. They build the harness that produces those workflows on demand, with shared evaluation, gates, and observability. The AI does the work. The person owns the system that produces specs.

Same work item. Three altitudes. Each one is a different job.

This is what altitude means, and it's why fluency on its own doesn't capture it. Fluency is what someone can do with AI. Altitude is the level of abstraction at which they can specify and redesign work productively. A fluent A0 can prompt brilliantly. They still can't move the team off a thousand individual chats. An A2 might be a slower prompter and still produce ten times the leverage, because they're operating on a different artefact.

I didn't find a framework that made this a first-class operational axis for my use case. There are adjacent precedents — abstraction work in linguistics, recent cognition research — but no production maturity model I reviewed maps cognitive abstraction against AI adoption as a separate dimension. That doesn't make it novel. It makes it a synthesis I needed for the design problem in front of me.

The reason it matters: most adoption ceilings I see in the populations I was designing for aren't tooling ceilings. They aren't model-quality ceilings. They're altitude ceilings. People stay at A0 because that's where their thinking is comfortable. The org spends a fortune on tools and the population just keeps using them as faster typewriters. Without altitude as a separate axis, you can't see this.

Pilot purgatory is the actual fight

Every reputable framework I reviewed names the same hardest transition: roughly stage 2 to stage 3. Experimenting to Operationalising. From having pilots to having something that runs reliably enough to count.

BCG puts 5% of firms at the highest maturity level worldwide, with 60% reaping hardly any material value at all. The bottleneck isn't at the top — it's in the middle. Pilot purgatory is real, and it's where most of the value gets lost.

A useful maturity model has to aim at this transition specifically. The climb from zero to one is mostly enthusiasm. The climb from four to five is mostly already-elite teams getting better at things they're already doing. The middle climb is the one that decides whether the org gets out of the pilot graveyard, and most maturity models give you the same generic guidance for that climb that they give for every other.

In the four-dimension model, this is where the dimensions stop being independent. Stage 2 to 3 is where Fluency without Excellence stops mattering, and where Excellence without Adaptive Operating Model stops mattering. The interference becomes the whole story. If the model can't surface that, it can't help you through pilot purgatory — and pilot purgatory is the actual job.

Pilot purgatory diagram showing the Experimenting to Operationalising transition as the fight zone where maturity dimensions braid together.

2026 → 2027 → 2028, as sequencing

The next call was time-shaped. I deliberately picked a sequenced roadmap rather than trying to ship the "right" model in one go.

2026 — tailored. Specific to the constraints of the group I was building for. Specific enough to act on this year.
2027 — unified. Converge toward a higher-level model as staff and BUs become ready for it. Less bespoke, more comparable.
2028 — benchmarkable. Externally comparable. Outcome-based measurement. The point at which an external auditor or peer group could meaningfully cross-reference the assessment.

This is a methodological call. Aiming at a higher-level unified model in 2026 would mean aiming at a moving target — agentic AI maturity didn't really exist as a category eighteen months ago and now everyone has one. Aiming at where the field is converging to, before staff are even ready to operate at where the field is, gives you a model that's both unactionable now and probably wrong in two years.

So the bet is the opposite. Aim at what's known and achievable for this population, this year. Plan to evolve. Be honest that the 2026 artefact is sequencing, not architecture. Be honest in the model about which parts of itself are temporary scaffolding. Most maturity models pretend every part is permanent. They aren't.

What challenge actually changed

Worth being honest about what I had to change under pressure, because the pressure is the only reason the model is what it is.

Three things moved. An early version had AI Excellence as the parent dimension, with Adaptive Operating Model nested underneath. That collapsed under challenge — they're different accountabilities, different interventions, different evidence bases, and they had to be peer dimensions. An early version had governance distributed across the four dimensions; it ended up consolidated mostly inside AI Excellence, with a 2028 trigger to consider splitting agentic governance out separately if the field forces it. And the biggest tension was individual-first versus org-first — executive vocabulary is org-shaped, budget conversations are org-shaped, and a strong case argued for leading with an org maturity number.

I disagreed, and that's still the bet: movement matters more than alignment, and movement starts with people. An org-first model gets you a number on a slide. An individual-first model gets a population doing the work that makes the number move. The slide is downstream of the population. I might be wrong about it. But it's a deliberate call rather than a default.

Outcome-based, not vanity scoring

The 2028 horizon is outcome-based measurement. Not "how many people completed AI training." Not "what percentage of teams have a use case in production." Those scale with effort, not value, and they reward the theatre maturity models in general reward.

Outcome-based measurement asks: did the AI work change the thing the business cares about? Did cycle time fall? Did defect rate change? Did revenue per employee move? MIT CISR's data showing Stage 4 orgs at +17.1pp revenue growth across n=721 is what credible outcome measurement looks like once the baselines exist.

The reason it sits at the 2028 horizon is that you need a couple of years of clean baselines before you can credibly attribute outcomes to AI work specifically. Asking for outcome-based measurement in 2026 is asking for fabricated numbers. So the model accepts vanity-shaped measurement for the early stages on the explicit promise that it'll be replaced.

The three tests for a productive maturity framework

If I had to compress the methodological lessons into something portable, it'd be these three. A framework is productive if it does all three. Most fail at least one.

Movement. Does it generate motion? When someone reads it, do they have a clearer sense of what to do tomorrow, this month, this quarter? A framework that just describes states and doesn't push you toward the next one fails this test.
Guidance. Does it tell people what to do next? With enough detail to act? "You should improve governance" isn't guidance — it's a topic. "You should move from process-keyed gates to consequence-keyed gates" is guidance. Most stop at the topic.
Measurement. Does it show whether the motion is real? This is the hardest one and the one most fail — because real measurement is uncomfortable. Real measurement says "you didn't actually move," and a framework whose business model is being adopted broadly has every incentive to never tell anyone that.

A framework that does one of these is a slide. A framework that does two of these is useful. A framework that does all three is rare, and it's rare for the same reason rigour is generally rare — it's costly to build and uncomfortable to use.

What to take from this

If you're building a maturity model, or stress-testing one your team is about to adopt, I'd run it through the audit at the top of this post and the three tests at the bottom. None of these are clever. They're just the questions I wish I'd asked earlier.

The bridge — between org and individual, between current and target, between vanity and outcome — is the actual artefact. Building it once you've seen the gap is genuinely interesting work. Adopting someone else's bridge before you understand the gap is the thing I'd be most careful about.

Written outside of work — opinions are mine, not my employer's. This draws on a few months of building, comparing, and stress-testing an AI maturity model for a small group of small businesses, with adversarial review pushing back at every step. The model itself sits inside our internal Hub; the methodological lessons are the part I think travel.

AIMaturity ModelsFrameworksTransformation