Forecasting Conflict

First, some fun

Are you smarter than a rat?

T maze

On Forecasting

Motivation

Conspicuous failures of existing methods
Success of forecasting models in other behavioral domains
Increased processing power

Predicting vs forecasting

Sound theory, but do not know whether the antecedent conditions have been satisfied.
Even with info + theory, randomness can play a role
Prediction is possible without explanation

Notes

Sound theory, but do not know whether the antecedent conditions have been satisfied. E.g., we know that revolts/revolutions/wars tend to happen when there is a famine, but were unable to predict the famine
Even with info + theory, randomness can play a role e.g., geophysicists understand the theory of plate tectonics and can monitor seismological antecedents, but still cannot predict earthquakes E.g., Italian Scientists to Stand Trial for Manslaughter in Quake Case: Italian``Enzo Boschi, the president of Italy's National Institute of Geophysics and Volcanology (INGV), will face trial on charges of manslaughter with six other scientists and technicians for failing to alert the residents of L'Aquila ahead of the devastating earthquake that struck the central Italian town on 6 April 2009, killing 308 people.’’
Prediction is possible without explanation
- e.g., ancient astronomers
- Even good predictions won't be believed without a sound theory because they undercut core scientific beliefs. Scientists would look for other mechanisms underlying these successes.

A problem is that these excuses are often used to justify poor forecasts

Explanation is possible without prediction:
- Pacifists do not abandon Gandhi's worldview just because he said in 1940 that Hitler is not as bad as "frequently depicted" and that `he seems to be gaining his victories without much bloodshed'
- Martin Feldstein predicted that the legacy of the Clinton 1993 budget would lead to stagnation for a decade.
Prediction is possible without explanation when people have forecasting successes

Judging judgement

What is a good judge?

Two criteria:

Getting it right
Thinking the right way

Getting it right

How do we measure it? - Accuracy - True positives at the cost of false alarms? - Risks of overpredicting vs underpredicting Should false alarms and hits be weighed equally? E.g., what is riskier: - in the 1980s, - underestimate the Soviet Union, tempting them to test the US's resolve? - Overestimate them and pay high military costs I.e., the risk here is to treat as `wrong' forecasters those who have made value-driven decisions to exaggerate certain possibilities. - How early?

thinking the right way

Do not violate basic probability theory. i.e., probabilities should sum to 1
Adjust your probability estimates in the face of evidence

From Tversky and Koehler, p. 553: "Stanford undergraduates (N= 196) estimated the percentage of U.S. married couples with a given number of children. Subjects were asked to write down the last digit of their telephone numbers and then to evaluate the percentage of couples having exactly that many children. They were promised that the 3 most accurate respondents would be awarded $ 10 each. As predicted, the total percentage attributed to the numbers 0 through 9 (when added across different groups of subjects) greatly exceeded 1. The total of the means assigned by each group was 1.99, and the total of the medians was 1.80. Thus, subadditivity was very much in evidence, even when the selection of focal hypothesis was hardly informative. Subjects overestimated the percentage of couples in all categories, except for childless couples, and the discrepancy between the estimated and the actual percentages was greatest for the modal couple with 2 children. Furthermore, the sum of the probabilities for 0, 1,2, and 3 children, each of which exceeded .25, was 1.45. The observed subadditivity, therefore, cannot be explained merely by a tendency to overestimate very small probabilities."

Political Forecasting: is it blind luck?

Ontological Skeptics

Interdeterminacy is due to the properties of the external world. A world that would be just as unpredictable if we were smarter. - Path dependency, aka increasing returns - QWERTY - Polya's urn: Small initial advantages accumulate - Rise of the West - Tiny advantages that Europe had: property rights, rule of law, market competition - Hard to know whether we face an increasing- or decreasing returns world. Ie., does history have a diverging branching structure that leads to a variety of possible worlds, or a converging structure that channels us into destinations predetermined long ago - Cleopatra's nose.

Complexity theorists Aka, the butterfly effect
- Gabriel Prinzip
- Great oaks from little acorns. Problem: impossible to pick the influential little acorn before the fact.
Game theorists Multiple or mixed strategy equilibria
- Players will second-guess each other to the point where political outcomes, like financial markets, resemble random walks.
- Financial geniuses are statistical flukes

Psychological Skeptics

We mispredict because of the way our minds work

Preference for simplicity: "Bachar al Assad is like Hitler"
Aversion to ambiguity and dissonance
- People are overconfidence in their counterfactual beliefs
- People dislike dissonance. They like to couple good causes with good effects. But detested policies can sometimes have positive effects. E.g., valued allies can have a frightful human rights record.
- People hate randomness.
  - e.g., rat experiment
  - When we know the base rate and not much else, we'd be better off predicting the most common outcome

Skeptics views: 6 hypotheses

Humans perform no better than chimps to predict turbulences
Diminishing marginal returns: casual reader of news will perform as well as expert
Reversion to the mean: lucky streaks of predictions will not last
As expertise rises, confidence in forecasts should rise faster than the accuracy of forecasts

Forecasting performance: How do we know?

METRICS. The risk to reward overly cautious forecasters: calibration vs discrimination

Perfect calibration if there is precise correspondence between subjective and objective probabilities. But Calibration rewards cautious forecasters, i.e., those who predict a base rate strategy
Discrimination: perfect scores when assign different probabilities to events that happen and to those that don't.

(optional) For those interested, see discussion on calibration vs discrimination:

What is the right baseline of comparison?

Crude algo: assign same probability as historically
Predict continuation of past state
Formal statistical equations

Results

Most existing research makes no effort at testing their theory on future data
- "isms"
- statistical models
Tetlock: let's see how well experts perform. 284 participants,
- most with doctorates, almost all with postgraduate training in polsci, econ, international law, diplomacy, journalism
- avg of 12 years of work experience
- academia, think tanks, governments, IOs
- Very thoughtful and articulate
- Broad cross-section of political, econ and national security outcomes

Results

Source: Tetlock, p. 51

Humans overpredict rare events
Experts no better than dilettantes
All humans far worse than algorithms, even simple ones

The experts fight back

Perhaps we didn't select the right experts? But little evidence of that: equally poor regardless of seniority or domain (academia, government, etc.),
- No better at short term vs long term, domestic v international, econ v political.
Perhaps our dilettantes are really experts. I.e., slightly less specialized, but still well read.
- So let's look at briefly briefed UG students. They are worse, so expertise does matter to an extent.
Maybe experts are very cautious. I.e., better safe than sorry. So we can correct for various such mistakes. In short, we take out the difference between their average forecast and the base rate for the outcome.

Foxes v hedgehogs

Why hedgehogs are here to stay

media attention

Challenge yourself!

RESULTS: markets

Current Large-scale conflict forecasting projects

What Data to use?

Structural indicators are too slow
Social media too fast
Event data

Existing projects

DARPA ICEWS (2007-present)
IARPA's
Peace Research Center Oslo (PRIO) and Uppsala University UCDP models
etc.

Convergent results

Temporal autoregressive effects are huge: the challenge is predicting onsets and cessations, not continuations
Spatial autoregressive effects—“bad neighborhoods are also huge
80% accuracy—in the sense of AUC around 0.8— in the 6 to 24 month forecasting window occurs with remarkable consistency: few if any replicable models exceed this, and models below that level can usually be improved
Measurement error on many of the dependent variables—for example casualties, coup attempts—is still very large
Forecast accuracy does not decline very rapidly with increased forecast windows, suggesting long term structural factors rather than short-term“triggers” are dominant. Trigger models more generally do poorly except as post hoc “explanations.”

What can and cannot be predicted?

Where algorithms do well

Nate Silver performed very well in the 2008 election (not so well in 2016…)
Routine elections in rich countries like the United States are some of the softest targets in political forecasting. Rules are transparent; high-quality data, including surveys of would-be voters, are often available; and the connection between those data and the outcome of interest is fairly straightforward.

Where algorithms do less well

Nate Silver fails too, even for elections
for international events, we often lack data. We might know the predictors, but be unable to get the data
Even simple indicators are tricky
- GDP is produced by government agencies
- Some don't even report national economic statistics
Events are rare
- Most states are "safe"
- Many states are obviously at risk
- a small set is uncertain
- Note: rare events $/neq$ Black swans
Heterogenous environment
- is the system changing significantly while we are trying to model it? How far back are data still relevant?
- Changing nature of conflict

Notes

See also "This problem bears some resemblance to forecasting U.S. presidential elections, in which most of the 50 states dependably vote Democrat or Republican; the hard part is predicting the dozen or so swing states. In international politics, there are many cases that seem reliably "immune" to certain crises, and there’s often also a small but self-evident set of usual suspects. It’s the small but critical set of cases in between those two extremes that make us work to earn our paychecks."

Black swan: an event that has a low probability even conditional on other variables Rare event: an event that occurs infrequently, but conditional on an appropriate set of variables, does not have a low probability

Changing nature of conflict: - In 1910: - Gunboat diplomacy” was an accepted norm, as were elements of bellicose and social Darwinism - Some competition occurred between approximate equals - Mediation was ad hoc with no established international institutions - Territorial change was credible - Threats in 2010 - Highly asymmetric distribution of military power - Threats get almost immediate attention from potential mediators, including the UN - Non-military sanctions are credible (Iraq, Iran) - Territorial changes are rare and highly problematic - Will changes in the technological environment—internet, UAVs, various monitoring technologies—change probabilities?

Irreducible sources of errors

Specification error: no model of a complex, open system can contain all of the relevant variables; I Measurement error: with very few exceptions, variables will contain some measurement error I presupposing there is even agreement on what the “correct” measurement is in an ideal setting; I Predictive accuracy is limited by the square root of measurement error: in a bivariate model if your reliability is 80%, your accuracy can’t be more than 90% I This biases the coefficient estimates as well as the predictions I Quasi-random structural error: Complex and chaotic deterministic systems behave as if they were random under at least some parameter combinations . E.g., rabbit population

Rational randomness such as that predicted by mixed strategies in zero-sum games
Arational randomness attributable to free-will I Rule-of-thumb from our rat-running colleagues: “A genetically standardized experimental animal, subjected to carefully controlled stimuli in a laboratory setting, will do whatever it wants.”
The effects of natural phenomenon I the 2004 Indian Ocean tsunami dramatically reduced violence in the long-running conflict in Aceh

Feed-forward

Effective policy response: in at least some instances organizations will have taken steps to head off a crisis that would have otherwise occurred.

Going further

Nassem Nicholas Taleb. The Black Swan
Daniel Kahneman. Thinking Fast and Slow
Philip Tetlock. Expert Political Judgment
Nate Silver. The Signal and the Noise