A review-process maturity model places your process on a ladder of maturity, not a pass/fail grade — and naming your level tells you the single thing to fix next. The rungs run from ad-hoc (reviews happen when someone remembers, and the criteria live in one lead's head) through consistent (same cadence, one shared framework) and calibrated (leads compared, ratings defensible across reviewers) to systemic (the process reliably feeds staffing, promotion and the partner track). You don't leap to the top. You find the level you're actually at and take the next step — because each level fixes the failure of the one below, so a rung skipped is a rung that collapses. For a project-based firm, "mature" has nothing to do with polished forms. It means the output the firm can't run without.
This framing matters because effort alone doesn't buy you a good process. Higher-maturity practices are the ones that correlate with results: organisations that review goals quarterly or more often are 3.5× more likely to land in the top quartile of business outcomes, and those with clear, well-set goals 4× more likely (Bersin by Deloitte, 2014). Meanwhile the rating on its own barely moves anyone — after a century of research there's little evidence that appraisal by itself improves performance (DeNisi & Murphy, Journal of Applied Psychology, 2017). The difference between a review that changes something and one that just files the past is a difference of level, not of form design.
What is a review-process maturity model — and why think in levels?
It's a way of asking a better question. Not "is our review process good?" — which has no actionable answer — but "what level are we at, and what's the next move?" Level-based process maturity comes from software: the Capability Maturity Model ranked a process on staged levels from ad-hoc/initial up to optimizing, each level institutionalising what the one below left to chance (Humphrey, Managing the Software Process, 1989; formalised by the SEI at Carnegie Mellon, 1991). The idea then jumped to people practices in the People Capability Maturity Model, which applied the same staged logic — five levels, advanced one at a time — to how organisations manage and develop their workforce (Curtis, Hefley & Miller, SEI / Carnegie Mellon, 2009).
Thinking in levels does two useful things. It replaces a verdict with a direction: a level isn't a grade you feel bad about, it's a diagnosis that points at the next rung. And it stops you buying the wrong fix — the most common mistake in a small firm is installing level-4 tooling on an ad-hoc base, spending on a platform when the actual problem is that nobody has agreed what "good" looks like yet.
What are the levels, from ad-hoc to systemic?
Four rungs, each defined by what it fixes. You climb them in order.
- Ad-hoc. Reviews happen when someone remembers — before a raise conversation, or when a client complains. The criteria live in a lead's head, there's no shared framework, and the output goes nowhere in particular. This is the founder-as-bottleneck level: quality depends entirely on which lead you happened to get.
- Consistent. Everyone is reviewed on the same cadence, against one shared framework, with a named owner who runs the cycle. This is the first rung where the process exists independently of any one person. But each lead still rates in their own way — the form is shared; the standard behind it isn't yet.
- Calibrated. Before ratings are final, leads are compared against each other and against the framework, so outlier raters are caught and a "meets the bar" means the same thing whoever wrote it. This is the rung that makes a rating defensible — and, as the next section shows, it's not optional dressing but a real correction to a real distortion.
- Systemic. The output reliably feeds the decisions the firm actually makes: who gets staffed on what, who's promoted, who's on the partner track, what each person develops next. The review stops being a document that gets filed and becomes an input the firm can't run without. This is where performance actually moves, because improvement comes from what happens after the rating, not the rating itself (Smither, London & Reilly, Personnel Psychology, 2005).
How do you tell which level you're actually at?
Look at six observable markers, not at how the forms look. Cadence (does everyone get reviewed on a predictable rhythm, or when someone remembers?); ownership (is there a named person who runs the cycle, or does it depend on individual leads?); framework (is there one shared standard for what "good" means at each level, or does each reviewer carry their own?); calibration (are leads compared before ratings are final?); follow-through (does the output change a staffing, development or promotion decision, or does it get filed?); and data (can you see completion and rating patterns across the firm?). Your level is the lowest rung where a marker is still missing — maturity is a chain, and it's only as strong as its weakest link.
The calibration marker is the one firms most often skip, and it's the one with the hardest evidence behind it. When researchers decomposed what actually drives a performance rating, the single largest source of variance wasn't the person's performance — it was the idiosyncrasy of the rater, at around 62% of the variance, against only about 21% for the ratee's actual performance (Scullen, Mount & Goff, Journal of Applied Psychology, 2000). In plain terms: before you calibrate, a rating tells you more about who did the reviewing than about the work. That's why "calibrated" is a genuine, non-skippable level and not a nice-to-have — a firm that rates on time but never compares leads is running level-2 machinery and calling it fair.
Why does the level matter more for a billable, multi-lead firm?
Because in a firm made of people, the review's output is load-bearing. A consultant or creative is staffed across several engagements under different leads, rated per engagement by whoever ran it, and the result feeds calibration across reviewers, promotion decisions and the partner track. An agency where reviews happen on time but never roll up into those calls is stuck at a low level however polished the form looks — the artifact exists, but the firm still makes its real people decisions in the corridor.
That's also why the level, not the tooling, is the thing to fix. In a big company an immature review process is a nuisance; in a boutique it lands directly on the senior talent you can least afford to lose, because a rating a person can't act on — and can't see the logic of — reads as a rating that wasn't fair. The maturity ladder gives a founder or Head of People a way to locate the firm honestly and pick the one fix that moves it up, instead of buying a platform a level too early or blaming individual leads for a gap that is really a missing shared standard.
How do you move up one level — without over-engineering?
Fix the single bottleneck to the next rung — nothing more. If you're ad-hoc, the next move isn't a platform; it's agreeing one shared framework and a cadence, so the process stops living in one head. If you're consistent, the next move isn't more forms; it's a calibration step where leads compare ratings before they're final. If you're calibrated, the next move is connecting the output to a real decision — staffing, development, promotion — because a review that changes nothing changes no one.
Two rules keep the climb honest. Don't skip a rung: you can't calibrate leads who are using different bars (calibration needs a shared framework first), and you can't feed the partner track with ratings that aren't yet defensible. And don't over-build: level-4 tooling on an ad-hoc base is money spent on a symptom. The reason to climb at all is that maturity correlates with outcomes — quarterly-or-more goal review is associated with 3.5× the odds of top-quartile results (Bersin by Deloitte, 2014) — but that payoff comes from the rung you're standing on connecting to real work, not from the sophistication of the instrument. The systemic level pays back precisely because improvement lives in the follow-up (Smither, London & Reilly, 2005), and a rating that connects to nothing has no follow-up to give.
What traps fake maturity?
The dangerous failures are the ones that look mature. Watch for these.
- Process theatre. Mature-looking forms, templates and a slick cycle sitting on top of an ad-hoc reality — the ratings are still one lead's opinion, dressed up. Polish is not a level.
- Tool-first maturity. Buying a performance platform and mistaking the software for the standard. A tool can carry a mature process; it cannot create one. If there's no shared framework, the platform just industrialises the inconsistency.
- Reviews that never touch a decision. The cycle completes, the ratings are filed, and nothing downstream changes — staffing, promotion and development are decided elsewhere. This is the most common ceiling: consistent on the surface, never systemic, and the rating alone won't rescue it (DeNisi & Murphy, 2017).
- One-off calibration. Calibrating once, for the promotion round, then never again. Calibration is a standing marker of the level, not an annual event; drop it and you slide back to level 2 without noticing.
- Skipping a rung. Reaching for "feeds the partner track" while ratings still aren't calibrated — building the roof before the walls. The decision inherits the distortion the missing rung was supposed to catch.
A review-process maturity self-check
Score one point per "yes". Your level is the highest rung where every marker below it holds — because maturity is a chain, not a total.
- Cadence — everyone is reviewed on a predictable rhythm, not when someone remembers.
- Ownership — a named person runs the cycle; it doesn't depend on individual leads.
- Framework — one shared standard defines "good" at each level, used by every reviewer.
- Calibration — leads compare ratings against each other before they're final.
- Follow-through — the output changes a real staffing, development or promotion decision.
- Data — you can see completion and rating patterns across the firm.
- No theatre — the polish matches the reality; the forms aren't ahead of the standard.
- Connected — the review is an input the firm's decisions actually depend on.
Not sure which rung you're on — or which fix comes next? Book a call and we'll place your process on the ladder and name the single next move, together.
FAQ
What is a review-process maturity model?
It's a ladder that places your review process on a level of maturity — ad-hoc, consistent, calibrated, or systemic — instead of scoring it good or bad. The point is diagnostic: naming your level tells you the one thing to fix next. The idea is borrowed from process-maturity models in software and workforce practice (Humphrey, 1989; Curtis, Hefley & Miller, SEI / Carnegie Mellon, 2009), applied to how a firm reviews its people.
What are the levels of review-process maturity?
Four, climbed in order: ad-hoc (reviews when someone remembers, criteria in one head), consistent (shared cadence and framework, a named owner), calibrated (leads compared, ratings defensible across reviewers), and systemic (the output reliably feeds staffing, promotion and the partner track). Each level fixes the failure of the one below, so you can't skip a rung.
How do I know which level my firm is at?
Check six observable markers — cadence, ownership, shared framework, calibration, follow-through, and data — not how the forms look. Your level is the lowest rung where a marker is still missing, because maturity is only as strong as its weakest link.
Why is calibration treated as its own level?
Because without it a rating is mostly the rater. When researchers decomposed performance-rating variance, idiosyncratic rater effects accounted for about 62% of it, versus roughly 21% for actual performance (Scullen, Mount & Goff, 2000). Comparing leads before ratings are final is what makes a rating mean the same thing whoever wrote it — a real correction, not dressing.
Isn't more mature just more bureaucratic?
No — maturity here means the output connects to real decisions, not more paperwork. Over-building is itself a trap: level-4 tooling on an ad-hoc base is money spent on a symptom. Higher-maturity practices correlate with better outcomes (Bersin by Deloitte, 2014), but the payoff comes from the process connecting to real work, not from the sophistication of the tool.

