One of the pleasures of collaborating with colleagues in
other disciplines is that you learn to see your own discipline’s maintained assumptions from the perspective of
outsiders. To get accurate forecasts, economists preferred
market mechanisms, whereas psychologists preferred self-report measures of beliefs. These preferences led us down
very different research paths. Psychologists wanted on prediction polls with probability judgments, whereas economists wanted on market mechanisms. They saw less need to
design supplementary mechanisms to identify top performers and sort them into teams, or to check groupthink inside
teams, or to train average performers to adopt the lessons
learned by the best performers, or to construct weighted-averaging algorithms for blending the most recent forecasts
of the best forecasters and then extremizing in proportion to
the diversity of the forecasters aggregated. Market pricing
could, in theory, handle all of that, though, in practice, it fell
somewhat short. By contrast, psychologists were less enthralled by markets. They saw it as a mistake to use black-box mechanisms when we have ready access to the complex
patterns of interaction among thoughtful forecasters working in a data-rich environment.
During the four years of IARPA tournaments, members
of our research group had many opportunities to explore
each other’s blind spots. We gradually got better at improving prediction polls with various behavioral and statistical
interventions, but it proved stubbornly hard to improve
prediction markets. One possibility is that both methods are
approaching diminishing marginal predictive returns. No
one ever expected a deterministic world in which Brier
scores of zero were possible—and many observers were
surprised that Brier scores could be pushed as far down as
they were, falling as low as .12 to .14 for the best polling
algorithms and .17 to .19 for prediction markets. In this
view, we may be reaching the point of irreducible uncertainty—and IARPA should be skeptical of future investments in geopolitical forecasting.
However, we know that any announcements of “mission
accomplished” are highly premature. And even if it were
true that IARPA had reached the point of diminishing
returns on improving geopolitical forecasting, that should
not mark the end of collaboration between economists and
psychologists. Forecasting only scratches the surface of the
much larger project of promoting greater rationality in public policy. The final output of even the best-run prediction
market or poll is just a number. Policymakers want more.
What is the “story” behind that number? Is there consensus?
What are the key drivers of the event? And how can we
influence the event? The limitations of purely numerical
forecasts should remind us of how much psychologists and
economists both ignored in our tournament-treadmill quest
to maximize accuracy.
We see improving the reasoning behind forecasting as a
domain in which psychologists have a natural advantage.
When forecasters are held accountable solely for their skill
in out-performing other forecasters—which is what prediction markets do—they have no incentives to share information or help others to sharpen their thinking. Well-designed
prediction polls can incentivize both collaboration (e.g.,
within teams) and competition (across teams).
Economists, however, may have the disciplinary advantage
when we leave controlled research settings and enter real-world organizations. A flourishing field inside microeconomics, agency theory, specializes in modeling the ever-evolving
cat-and-mouse games between “principals”—those paying for
the forecasts—and “agents”—the forecasters. Employers of
forecasters in the media, universities, government, or private
sector often want “their” forecasters to achieve many goals
beyond mere accuracy, including getting attention, being entertaining, playing to the prejudices of key constituencies, and
avoiding saying anything that could later prove embarrassing.
Revisit, in this light, the Goldstein et al. (2018) study of
professional analysts participating in the intelligence community’s official prediction market. One possible explanation
for why analysts lost to superforecasters is that analysts
have learned to survive in a world that demands juggling
multiple goals, whereas superforecasters have learned to
survive in an artificial tournament world that requires maximizing one goal, accuracy. For instance, inside the intelligence community, the directionality of prediction errors—
under- or overestimating threats—carries consequences.
Right after 9/11, underestimating another threat to the U.S.
could have been devastating to one’s career and indeed the
intelligence community. Right after 2003 when it was clear
there were no weapons of mass destruction in Iraq, the error
of overestimating another threat could have been just as
In a blame-game world, it is understandable why many
professionals oppose the quantification of beliefs (Lanir &
Kahneman, 2006; Tetlock & Mellers, 2011). The rational
bureaucratic-political response is to retreat into vague-verbiage forecasting –“there is a distinct possibility of a
major war”—that makes it virtually impossible to pin analysts down. “Distinct possibility” can take on values as low
as 10–20% or as high as 80–90% in readers minds (Tetlock
& Gardner, 2015). If the event does occur, analysts can say
“I warned you that it was distinctly possible,” and if the
event does not occur, they can equally rightly say, “I merely
said it was possible.”
But political safety is cognitively costly. Vague-verbiage
forecasting prevents analysts, or indeed any professionals,
from getting the feedback they need to become well-
calibrated. Our joint work with economist Richard Zeck-
hauser has shown that there are real returns to precision. It
turns out that better forecasters are the ones who make more
nuanced distinctions along the probability continuum
298 MELLERS AND TETLOCK