The Cost of Looking Decisive

WORKING PAPER · VERSION 1.0 · JULY 2026 · A live inquiry, released in versions as evidence accrues. Comments and case evidence welcome: info@impactthinking.co.uk

Abstract

Institutions say they want good judgment. What they reward, observably, is the performance of certainty — the confident brief, the unhedged forecast, the leader who “projects decisiveness” while the question is still genuinely open. The preference is not cosmetic: a robust experimental literature shows that people select and retain confident advisers over accurate ones, and that advisers, knowing this, compete on confidence — a market that systematically punishes calibrated uncertainty. Meanwhile the forecasting literature shows the correlation running the other way: the most accurate judges of complex questions are precisely the hedgers, the updaters, the takers of many small positions — and the most confident public experts are among the least accurate. This working paper examines the institutional machinery that sustains the mismatch, its cost in premature closure and unexamined risk, and the question of what an institution would have to change — in its meetings, its promotion signals, and its leaders’ own capacity to stand in not-knowing — to stop paying for confidence and start paying for judgment.

The market for confidence

Begin with the demand side. In controlled studies of advice-taking, people systematically prefer advisers who express high confidence — the confidence heuristic — using certainty as a proxy for competence even when accuracy information is available.¹ The supply side responds as any market would: in competitive advice settings, advisers inflate their expressed confidence beyond their private beliefs, because calibrated hedging loses the client.² Radzevick and Soll’s finding deserves its bluntness: competition makes advisers more overconfident, not more accurate, because confidence is what is being bought.

Now the accuracy side. Tetlock’s two-decade study of expert political judgment found the experts most in demand — the most famous, most quoted, most confident — performing worst against reality, while the best performers were the cognitively humble: foxes rather than hedgehogs, takers of provisional positions, relentless updaters.³ The superforecasting research sharpened the point: the distinguishing marks of the most accurate judges under genuine uncertainty are precisely the behaviours institutional theatre punishes — granular probabilities, visible revision, comfort saying “I don’t know yet.”⁴

Put the three findings together and the conclusion is uncomfortable: the confidence an institution rewards and the judgment it needs are not merely different quantities. Under uncertainty, they are inversely correlated — and the institution’s selection machinery is pointed at the wrong end.

The machinery of premature closure

Inside organisations the market for confidence operates through mechanisms mundane enough to escape notice. The meeting rewards the first structured answer: a genuinely open question, voiced in a senior forum, survives on average a few minutes before someone relieves the room of it — and the reliever is experienced as having contributed, the holder-open as having stalled. Career signalling compounds it: “decisive” appears on every promotion rubric; “knows when not to decide yet” appears on none. The asymmetry of accountability completes the machine: the confident wrong call, taken with the room’s approval, is a shared misfortune; the hedged right call is barely remembered; and the leader who held a question open under pressure and was later vindicated receives no credit at all, because the counterfactual disaster never happened and unhappened disasters have no constituency.

Janis documented the collective endpoint half a century ago: groups under pressure for consensus converge on premature certainty and suppress the doubt that would have saved them.⁵ The inquiry literature keeps re-finding it — the confident assessment that hardened too early, the dissenting analysis that was structurally unhearable (Working Paper: The Cost of Silence treats the voice side of this machinery). What the confidence market adds to groupthink is a career dimension: it is not only the group dynamic that closes questions early; it is that every individual’s incentive, read correctly, is to be the closer.

Standing as the scarce capacity

The alternative to performed certainty is not indecision — that is the false binary the confidence market trades on. It is the capacity this desk’s programme research calls standing: remaining steady in genuine not-knowing, holding the question open on purpose while the picture forms, and then authoring a real position — a declaration one stands behind — rather than laundering the choice through false inevitability. Standing is more demanding than confidence, not less: it requires tolerating the exposure of visible uncertainty in cultures that read it as weakness, and it requires knowing the difference between the decisions that need analysis, the decisions that need speed, and the decisions that need a stand — a triage the confidence performance never has to make, because it treats everything as already answerable.

The institutional question, then, is not how to make leaders humbler — humility campaigns meet the same fate as openness campaigns — but how to re-price the behaviours: what would have to change for calibrated uncertainty to be career-safe and premature certainty career-expensive?

What re-pricing would look like

Early practice suggests the levers are, as elsewhere in this research programme, conversational and evidential rather than exhortative. Decision forums that separate question-holding from answer-giving as explicit phases, so that holding open is a sanctioned activity with a name rather than a failure to contribute. Position papers that require probability language and named uncertainties, making calibration visible and hedging legible as rigour rather than weakness. Decision post-audits that score process under uncertainty — was dissent heard, were probabilities stated, was the closure timed or premature — rather than outcomes alone, which under uncertainty reward luck. And, hardest, senior modelling: one leader visibly saying “I don’t know yet, and here is when and how we will” — and being seen to prosper — re-prices the behaviour faster than any framework, for the same reason one punished speaker re-prices silence.

The tournament evidence

The strongest field evidence on what accurate judgment under uncertainty actually looks like comes from the IARPA forecasting tournaments run for the US intelligence community from 2011 — a rare setting where confident and calibrated judgment competed head-to-head on hundreds of real geopolitical questions with verifiable outcomes. The Good Judgment Project’s teams won the tournament decisively; its best forecasters — selected and trained for exactly the fox-like behaviours institutions under-reward — outperformed by margins that, by widely reported accounts, exceeded even intelligence analysts working with access to classified information.⁴ The winning behaviours were unglamorous and consistent: granular probabilities rather than verbal certainty, frequent small updates rather than defended positions, active soliciting of disconfirming argument, and teams whose norms made changing one’s mind cheap.

Read as an institutional finding rather than a psychological one, the tournament result is pointed: the behaviours that won are all observable and trainable, and every one of them is punished, mildly and continuously, by the ordinary confidence market of a senior forum — where the update reads as inconsistency, the probability as hedging, and the solicited counter-argument as weakness. The scarce input is not cognitive talent. It is an institutional setting in which calibrated behaviour is career-safe.

Implications by seat

For executive teams, four near-term moves. One: split significant decision forums into named phases — framing and question-holding, then evidence and dissent, then closure — so that holding-open is a sanctioned activity rather than a failure to contribute. Two: require probability language and named uncertainties in decision papers, making calibration legible as rigour. Three: institute decision post-audits scored on process under uncertainty — dissent heard, probabilities stated, closure timed — not outcomes alone, which under uncertainty reward luck and punish honesty. Four: appoint a rotating challenge role with real standing (the pre-mortem made permanent⁷), so the counter-case arrives by design rather than by career risk.

For boards, the instrument is the closure question: for each major recommendation, “what would have to be true for this to be premature — and who in the organisation currently believes it is?” A management team that cannot produce a live dissenting view has not resolved the uncertainty; it has suppressed it, and the board is the last checkpoint at which that distinction is cheap.

For governments and advice functions, the tournament evidence carries a structural lesson: forecasting accuracy improved when it was scored — when advice carried probabilities that were later marked against reality. Advice functions that never score their calls are running the confidence market at national scale; the remedies (calibration training, scored forecasts, institutional red teams with protected status) are proven, cheap relative to a single premature-closure failure, and adopted almost nowhere.

What we are tracking, and open questions

We are tracking, in participating senior forums: the survival time of open questions before closure; the ratio of probability-language to certainty-language in decision papers; post-audit verdicts on closure timing; and career outcomes of leaders coded (by structured observation) as standers versus closers — the last being the slow, decisive variable. Open questions we hold honestly: whether standing can be developed at mid-career or is largely formed earlier (the programme evidence says developed, the selection evidence is mixed); whether re-priced forums revert under crisis, when the demand for performed certainty spikes; how to distinguish, observationally, principled question-holding from ordinary indecision wearing its costume; and whether AI counsel — the always-available confident answer examined in The Atrophy of Judgment — is about to make the confidence market catastrophically more liquid. Evidence and counter-cases are invited; the paper will be revised against them.

References & sources

Price, P. C. & Stone, E. R. (2004). “Intuitive Evaluation of Likelihood Judgment Producers: Evidence for a Confidence Heuristic.” Journal of Behavioral Decision Making, 17(1).
Radzevick, J. R. & Soll, J. B. (2011). “Competing to Be Certain (But Wrong): Market Dynamics and Excessive Confidence in Judgment.” Management Science, 57(1).
Tetlock, P. E. (2005). Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press.
Tetlock, P. E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.
Janis, I. L. (1972). Victims of Groupthink. Houghton Mifflin.
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
Erhard, W., Jensen, M. C. & Zaffron, S. Being a Leader and the Effective Exercise of Leadership. SSRN Working Paper.
Klein, G. (2007). “Performing a Project Premortem.” Harvard Business Review.

IMPACT THINKING RESEARCH · BY BEN BOTES · WORKING PAPER 04 · v1.0 · JULY 2026

Read your own decision pattern.

The Decision-Making Diagnostic reads where you analyse, where you stall, and where you take a stand — in about three minutes.

Take the Decision-Making Diagnostic → Back to the research desk