November 12, 2015
Mass atrocities occur rarely, and they are hard to predict. So aren’t risk assessments in the form of predicted probabilities, like the ones the Early Warning Project produces, a little too precise? When we ask the participants in our opinion pool to assign a number to their beliefs about the likelihood that various events will happen, are we really adding useful information, or are we putting too fine a point on things?
Evidence from a massive four-year experiment in geopolitical forecasting suggests that we really do create additional useful information when we push our experts to assign a number to their expectations. In a recent conference paper, a group of scholars involved in the Good Judgment Project reported a few key findings from analysis of their data, which provided “an unprecedented opportunity to evaluate whether quantified numerical probability estimates actually yield information that qualitative alternatives would not.” Among those findings:
- Forecasters are more accurate when they put a number on it. Organizations that try to anticipate future risks often use words instead of numbers to convey their assessments. The U.S. intelligence community, for example, explicitly instructs its analysts to use “words of estimative probability” that group risks in wide bins—things like “very unlikely” for 5–20 percent, or “almost certain” for 95–99 percent. When the authors of this new paper compared forecasters’ best guesses with versions of those guesses rounded to match the less precise categories, they found that the sharper forecasts were significantly more accurate than the rounded ones. In other words, “Respondents demonstrated that forecasters can systematically parse probabilities more finely than current practices allow.”
- The gains in accuracy that come with quantifying forecasts do not depend on forecasters’ innate skill or numeracy, or on prior training in probabilistic reasoning. The best forecasters’ accuracy suffers the most when their assessments are coarsened to avoid giving the impression of “false precision,” but forecasters at all skill levels and cognitive styles do worse when their estimates are put into broader bins.
Here’s how the paper’s lead author, psychologist Barbara Mellers, summarized their conclusions in an email to us:
Concerns about "false precision" can be real, but in geopolitical forecasting, we find the use of probability phrases (i.e., words rather than numbers) or probability scales that are constrained by too few categories actually reduce the accuracy (information content) of forecasts. And even worse, it is the most accurate and thoughtful forecasters whose beliefs are being discarded! We can't afford to be so caviler with our predictions about the future.
Co-author Jeffrey Friedman, an assistant professor of government at Dartmouth College, noted that the gains in accuracy were especially large on very rare events—like mass atrocities.
We found that forecasters could consistently parse probabilities in almost any context we examined. But, generally speaking, forecasters achieved their highest returns to precision when describing very small probabilities. Decision theorists have always argued that the difference between a 5 percent chance and a 2 percent chance can have major implications for policy. What our data show—perhaps surprisingly—is that forecasters can actually draw these distinctions reliably, even when analyzing a subject as complex and subjective as international affairs.
We won’t pretend that the assessments the Early Warning Project generates are precise representations of the true probabilities of the events we track. For better or for worse, those are fundamentally unknowable. What we can say, however—and now with more confidence—is that our attempts to quantify these risks should produce more accurate assessments, and thus more useful information, than the vague warnings and fuzzier assessments that this and many other fields have traditionally employed. It turns out that we can and should try to move from the vague (“Is there any risk?”) to the specific (“How much risk is there?”), and we do that best when we quantify our best guesses.
NEXT POST: Renewed Armed Conflict in Southern Turkey