Day – 3
I would like to talk to today about three aids to assessing empirical work: theory, common sense and knowledge of context. They are equally valuable for those doing the work.
In 1975 the economist Isaac Ehrlich published a time series study of the deterrent effects of capital punishment running from the late 1930s to the mid 1960s, using what were at the time advanced techniques of statistical modeling. His results, which indicated that one execution deterred 7 or 8 homicides, were attention grabbing to say the least, particularly since they were set against a background of numbers of earlier studies, using apparently simpler methods, almost all of which found no evidence that capital punishment deterred homicides, and some of which suggested a possible brutalization effect. Ehrlich’s findings gained credibility by the apparent statistical sophistication of his methods (with tongue only slightly in cheek, I would define statistically sophisticated methods as methods with lots of equations and numbers that many readers cannot or do not have the time to follow).
Because the topic was so controversial, other social scientists reexamined Ehrlich’s data using equally sophisticated, and often similar, statistical approaches. These reanalyses revealed that Ehrlich’s model and his results were acutely sensitive to the years analyzed. If the last years of the time series – the 1960s – were removed from the analysis, evidence for a deterrent effect disappeared, and in some studies a brutalization effect seemed possible. The 1960s were, of course, a period when executions for murder dropped to almost zero and all sorts of crimes were rapidly increasing. This replication research pretty much destroyed the credibility of Ehrlich’s results in the academic community, though their influence on the popular debate, like the Cheshire cat’s grin, long lingered.
The
Ehrlich story seems to be a case where the results of one analysis were
trumped by the contrary results of many others, but to the best of my
recollection none of those who had done the competing analyses ever hit
the nail on the head when it came to explaining why their work refuted
Ehrlich’s. The answer lies in theory. Ehrlich’s
science contribution was not to tell us how many lives an execution
might save but rather to test a theory that had considerable a priori
plausibility; namely, the theory that the more likely a person was to
be caught and punished for homicide, and the more likely he was to
receive the extreme sanction if he were caught and punished, the less
likely he would commit homicide. Ehrlich’s
results regarding the deterrence implications of the death penalty
gained credibility in part because other aspects of the theory, in
particular the expectation that apprehension probabilities would be
more important to deterrence than execution probabilities given
apprehension, were also supported by his data. However,
nothing about Ehrlich’s deterrence theory would lead one to predict
that executions would have one effect from, say, 1937 through 1960 and
another effect from 1937 through 1967. Rather
the theory predicted similar reactions to execution prospects
throughout the time series, and in any particular subset of that
series, subject only to greater difficulties in spotting deterrence due
to loss of statistical power as the number of data points diminished. So
what destroyed Ehrlich’s conclusion, and properly so, was not the
inconsistency of other analyses using different time periods with his
specific results, but rather their inconsistency with the theory
Ehrlich purported to be testing. The lesson I
draw for empirical legal studies is that researhers and readers should
identify what theories motivate a model and analysis and ask whether
other data and analyses are consistent with that theory. If not something is wrong. Further
questions should then be asked and a new theory perhaps advanced.
Numbers alone, no matter how sophisticated the analysis that yielded
them, should never dazzle the eye.
Ehrlich published another study of the deterrent effect of the death penalty two years later. It
was cross-sectional rather than longitudinal, and although it never got
the publicity of his first effort, it was, to my mind, in many ways the
stronger study. But it had two problems which justified disregarding it. The first is that it contained no variable to control for Southern status. At the time Ehrlich published his second study, the idea that the South was associated with a “culture of violence: - at least when it came to homicide - was commonplace. Not
only did cross-sectional deterrence studies regularly control for
Southern location through the use of a dummy variable for states of the
former confederacy, but there was invariably a significant Southern
effect. Results from a model that ignores plausible and easily available competing variables should always be regarded with suspicion.
Moreover,
while Ehrlich’s longitudinal model found that each execution saved 7 or
8 lives, his cross-sectional study suggested, if I recall correctly,
that each execution saved 18 lives. Here common sense comes into play. The
world is not statistically malicious. It does not try to hide reality
from our senses, and while states differ in many ways that might be
relevant to crime rates, as aggregates they are unlikely to be that
different in factors apart from the death penalty that lead to homicide. If
one execution really deterred as many as 18 murders (and some studies
done about the time of Ehrlich’s put the number of murders deterred as
high as 100!), one might expect some sign of this even in uncontrolled
comparisons. Yet the work of Sellin and others
who had looked at homicide rates in neighboring states did not find a
whisper of deterrence; surely, if an execution saved 18 lives something
would have been whispering. To put this another way, common sense is one of the best tools that data analysts and readers of analyses can employ. If
something looks like it cannot be true given other information we have,
it is probably not true, no matter how impressive the statistics that
generated the result. At a minimum we can and
should ask how seemingly inconsistent observations (e.g. on the one
hand, one execution deters 18 homicides, and, on the other, whether we
compare neighboring states by the existence of the death penalty or by
executions, we see no evidence of deterrence) can be true
simultaneously, and before accepting the results of either finding we
should require good answers.
To
give another example, long before other research called their results
into question, it was common sense that made me suspicious of John Lott
and David Mustard’s claim in the Journal of Legal Studies that right to carry laws diminish violent crime. What
made me skeptical was their finding that while right to carry laws
diminish violent crimes like murder, rape, and aggravated assault they
led to increases in non-violent property crimes. The
authors had an explanation for this; namely, that the types of crimes
are substitutes for one another, and as violent crimes are deterred for
fear of meeting someone with a gun, crimes that involve
non-confrontational thefts will be substituted for them. But a theory can be offered to fit any data, and when the theory is constructed post hoc rather than offered a priori,
one must be especially cautious in accepting it. The theory Lott and
Mustard offered was borrowed from economics where it often makes
considerable sense of such behavior as purchasing decisions, career
choices and the like, It may even sensibly
explain choices criminals make between some crimes, for example the
choice of whether to rob someone or burgle an apartment. But
the idea that taking property by stealth might substitute for crimes
like rape or murder ignores what we know about differences between
these crimes, the motives for them and those who commit them. More bluntly, applying the substitution hypothesis to the crimes Lott and Mustard saw as substitutes defies common sense. Similarly
some of the results in Lott’s later book, such as the suggestion in the
data that reducing the number of black women over 40 would diminish
certain crimes substantially are weird to the point of being incredible. As with Ehrlich’s results, one should not get carried away with the numbers, no matter how statistically significant they are. The scientific product of social science research is support for theory and the building of models. If
some of the implications of applying a theory-driven model to the data
make no sense or worse yet are obviously mistaken, absent some very
good explanation for why this has happened, the safest conclusion is
that there is something wrong with the model, methods or data, and even
outcomes consistent with plausible theory cannot be trusted.
What
is plausible depends, of course, on what we know about the matter we
are studying. More than occasionally empirical scholars seem to have
little appreciation of context beyond the general knowledge everyone
has and the specific data they have collected. Without a deep
appreciation of context, even the best scholars may be misled. For
example, some years ago Al Blumstein and Daniel Nagin, who were and are
among the very best of our nation’s quantitative criminologists, did a
study of the deterrent effects of likely sentences for draft evasion on
draft evasion rates. For its time the study was
in many ways exemplary – variables were carefully measured and
analyzed, and it was refreshing to see an investigation into deterrence
outside the street crimes and capital punishment contexts. The results of the Blumstein-Nagin research strongly confirmed deterrence theory. Resisting
the draft by refusing induction was substantially higher in those
jurisdictions that sentenced resisters most leniently. Yet I regarded the study as worthless.
The reason had nothing to do with the study’s data, model or methods or with the plausibility of the theory it was testing. Rather
I had done considerable draft counseling during the Viet Nam war, and I
knew that until late in the war the Selective Service System would
routinely allow those it was calling up to transfer their place of
induction from their local draft board to wherever they were residing,
if they expected to be away from home on their induction date. Since
the crime occurred and was tried in the jurisdiction where induction
was refused, draft counselors routinely advised young men determined to
resist the draft to transfer their place of induction to San Francisco
or some other jurisdiction known for its lenient sentencing. Blumstein
and Nagin’s results were indeed correct – draft resistance was far
higher in those federal districts where sentences were lower - but
because they did not appreciate the context in which draft resistance
occurred, they were fooled by their data. They
thought they were finding strong evidence of deterrence, but it is more
plausible supposed that they had only amassed evidence of how people
gamed the system. One cannot say whether these
resisters would have refused the draft had there been no San Franciscos
and no prospects of less than five year sentences, but my impression
from the resisters I talked to was that while most preferred the
prospect of serving less time rather than more time, their ultimate
decisions seemed unlikely to be affected by marginal differences in the
penalties they expected to receive.
Anyone can do empirical analysis, and anyone can do it poorly. The
empirical research to which lawyers should aspire requires deep
knowledge of the contexts studied, an appreciation for how theory is
generated and fairly tested, and considerable common sense.
Rick
Tomorrow: Some Simple Cautions
Rick:
Great posting. I especially loved the line "The world is not statistically malicious." A bit of genius in that.
Dan
Posted by: Dan Cole | 10 August 2006 at 01:00 PM