Responding to the recent debacle involving a grad student uncovering a blundering error in a paper by noted Harvard economists (here), Betsey Stevenson (Mich.) and Justin Wolfers (Mich.) initiated a (now growing) list of suggestions on how to minimize errors in empirical research. Not surprisingly, others, including Andrew Gelman (Columbia), have added to the list (here). While the list will inevitably grow, it already includes basic, helpful reminders for even the most experienced researchers.
A recent news story underscores the importance of basic replication (as well as scholarly attention to detail) for empiricists.
"His [Thomas Herndon's] professors at the University of
Massachusetts-Amherst had set his graduate class an assignment--pick an
economics paper and see if you can replicate the results. It's a good
exercise for aspiring researchers. Thomas chose Growth in a Time of Debt. It was getting a lot of
attention, but intuitively, he says, he was dubious about its findings."
Turns out that the grad student's intuition was dead-on as core results from the influential economics article--authored by two leading Harvard economists--could not be replicated. Herndon's replication efforts uncovered a basic
error in the spreadsheet. "The Harvard professors had accidentally only
included 15 of the 20 countries under analysis in their key calculation
(of average GDP growth in countries with high public debt). Australia, Austria, Belgium, Canada and Denmark were missing." In addition, other data for some countries were missing
While questions about who owns judges' official working papers implicate legal historians most directly, such (admittedly complex) questions should also interest empirical legal historians. In "Judges and Their Papers," Kathryn Watts (Washington) makes the case that judicial papers should be construed as public rather than private property. An excerpted abstract follows.
Article is the first to give significant attention to the question of who
should own federal judges’ working papers and what should happen to the papers
once a judge leaves the bench. Upon the 35th anniversary of the enactment of
the Presidential Records Act, this Article argues that judges’ working papers
should be treated as governmental property — just as presidential papers are.
Although there are important differences between the roles of President and
judge, none of the differences suggest that judicial papers should be treated
as a species of private property. Rather, the unique position of federal
judges, including the judiciary’s independence, should be taken into account
when crafting rules that speak to reasonable access to and disposition of
judicial papers — not when answering the threshold question of ownership.
Ultimately, this Article — giving renewed attention to a long forgotten 1977
governmental study commissioned by Congress — argues that Congress should
declare judicial papers public property and should empower the judiciary to
promulgate rules implementing the shift to public ownership. These would
include, for example, rules governing the timing of public release of judicial
papers. By involving the judiciary in implementing the shift to public
ownership, Congress would enhance the likelihood of judicial cooperation,
mitigate separation of powers concerns, and enable the judiciary to safeguard
judicial independence, collegiality and confidentiality."
As you've probably heard, the U.S. News 2014 Law School Rankings are out. Rather than offer commentary, I thought I'd piggyback on Paul Caron's useful post comparing the overall rankings with the peer reputation rankings. So here, for your edification, are the numbers Paul compiled in scatterplot form. (PDF)
Now that Cass Sunstein (Harvard) has departed the Obama Administration (and OIRA) and migrated back to academic life, in a recent paper published by the University of Chicago Law Review, Empirically Informed Regulation, Sunstein illustrates the central role data play (or, at least should play) in the development of regulations, with an emphasis on behavioral economics. The paper's abstract follows.
"In recent years, social scientists have been incorporating empirical findings about human behavior into economic models. These findings offer important insights for thinking about regulation and its likely consequences. They also offer some suggestions about the appropriate design of effective, low-cost, choice-preserving approaches to regulatory problems, including disclosure requirements, default rules, and simplification. A general lesson is that small, inexpensive policy initiatives can have large and highly beneficial effects. In the United States, a large number of recent practices and reforms reflect an appreciation of this lesson. They also reflect an understanding of the need to ensure that regulations have strong empirical foundations, both through careful analysis of costs and benefits in advance and through retrospective review of what works and what does not."
Allison Morantz (Stanford) has an interesting paper, Coal Mine Safety: Do Unions Make a Difference?, in the current Industrial and Labor Relations Review, Vol. 66:1 (Jan. 1, 2013). The paper draws from UMWA data and finds that increased unionization correlates with a decrease in serious worker injuries, but an increase in nontraumatic worker injuries. Perhaps even more interesting, however, is the potential effect of unionization on worker injury reporting. The abstract follows.
United Mine Workers of America (UMWA) has always advocated strongly for miners'
safety, prior empirical literature contains no evidence that unionization
reduced mine injuries or fatalities during the 1970s and 80s. This study uses a
more comprehensive dataset and updated methodology to examine the relationship
between unionization and underground, bituminous coal mine safety from 1993 to
2010. I find that unionization predicts a substantial and significant decline in
traumatic injuries and fatalities, the two measures that I argue are the least
prone to reporting bias. These disparities are especially pronounced among
larger mines. My best estimates imply that overall, unionization is associated
with a 13-30% drop in traumatic injuries and a 28-83% drop in fatalities. Yet
unionization also predicts higher total and nontraumatic injuries, suggesting
that injury reporting practices differ between union and nonunion mines."
Univ. of Georgia economists Christopher Cornwell, David Mustard, and Jessica Van Parys set out to, in part, "examine the relationship between the (objective) test-score differences and (subjective) teacher grades." What they find is that while girls receive systematically higher grades than boys, on average, in elementary school, "the grades awarded by teachers are not aligned with test scores." The paper's abstract puts the point more bluntly: "Boys who perform equally as well as girls on reading, math and science tests are graded less favorably by their teachers." The authors ascribe the "misalignment" of objective test scores and subjective teacher grades to "non-cognitive skill development" differences between elementary school boys and girls.
Not surprisingly, this academic article, published in the Journal of Human Resources (48:1, Winter 2013), quickly attracted an array of robust public commentary (e.g., here and here). Left un-plumbed, thus far anyway, are the potential implications for Title IX.
To be sure, causal directions in this study are anything but clear thus far. After all, when seeking to explain statistical "misalignment" between standardized test scores and teacher (non-anonymous) grades, whether boys over-perform on standardized tests or under-perform in classroom grades (or the reciprocal for girls) is anything but clear. Nonetheless, these findings raise potentially uncomfortable questions.
An interesting discussion recently emerged (here and here) of a study comparing two agricultural experiments (involving different seeds of cowpeas to farmers in Tanzania) -- one blinded, one unblinded. As Andrew Gelman (Columbia) notes: "Bulte et al. find much different results in the two experiments and
attribute the difference to expectation effects (when people know
they’re receiving an experiment they behave differently); Ozler is
skeptical and attributes the different outcomes to various practical
differences in implementation of the two experiments."
Dan Ho's (Stanford) interesting paper on restaurant grading, Fudging the Nudge: Information Disclosure and Restaurant Grading, appears in the current YLJ (122: 574, 2012) and will likely appeal to "foodies" (as well as those looking for a brief respite from grinding through fall semester grading). The paper exploits data from San Diego and New York and uncovers various structural challenges incident to information disclosure to consumers. The abstract follows.
"One of the most promising regulatory currents consists of “targeted” disclosure: mandating simplified information disclosure at the time of decisionmaking to “nudge” parties along. Its poster child is restaurant sanitation grading. In principle, a simple posted letter grade (‘A,’ ‘B,’ or ‘C’) empowers consumers and properly incentivizes restaurateurs to reduce risks for foodborne illness. Yet empirical evidence of the efficacy of restaurant grading is sparse. This Article fills the void by studying over 700,000 health inspections of restaurants across ten jurisdictions, focusing on San Diego and New York. Despite grading’s great promise, we show that the regulatory design, implementation, and practice suffer from serious flaws: jurisdictions fudge more than nudge. In San Diego, grade inflation reigns. Nearly all restaurants receive ‘A’s. In New York, inspections exhibit little substantive consistency. A good score does not meaningfully predict cleanliness down the road. Unsurprisingly, New York’s implementation of letter grading in 2010 has not discernably reduced manifestations of foodborne illness. Perhaps worse, the system perversely shifts inspection resources away from higher health hazards to resolve grade disputes. These results have considerable implications, not only for food safety, but also for the institutional design of information disclosure."
A paper now circulating in SSRN (and forthcoming in Judicature) explores the impact of Twombly and Iqbal on dismissal rates and does so with an interesting methodological twist. A New Look: Dismissal Rates in Federal Civil Cases, by Scott Dodson (Hastings), contributes to the growing empirical literature by coding at the individual claim--rather than the case--level. The abstract follows.
"In the wake of Twombly and Iqbal, a number of studies have been conducted to
determine the decisions' effects on dismissal practice in federal civil cases.
However, those studies have tended to code whole cases rather than claims --
leading to the ambiguous coding category of “mixed” dismissals and to problems
in characterizing the nature of the dispute -- and have failed to distinguish
between legal sufficiency and factual sufficiency, potentially masking important
detail about the effects of the pleadings changes.
This paper begins to fill in that detail. I compiled an original dataset of
district court opinions and coded each claim -- rather than whole case --
subject to an adjudicated Rule 12(b)(6) motion. For each claim, I also
determined whether the court resolved the motion on grounds of legal or factual
sufficiency. This methodology opened an unprecedented level of granularity in
The data reveal statistically significant increases in the dismissal rate
overall and in a number of subsets of claims. I also find an increase in the
relative prevalence and efficacy of factual-insufficiency arguments for
dismissal. Perhaps surprisingly, I find a decrease in the relative prevalence
and efficacy of legal-insufficiency arguments for dismissal. These data and
insights on the rationales of dismissals are new to the literature and suggest
that Twombly and Iqbal are affecting both movant strategy and judicial
Cornell colleagues, Ted Eisenberg and Marty Wells, empirically analyze leading ranking metrics for refereed law journals in their recent paper, Ranking Law Journals and the Limits of Journal Citation Reports. Their analysis of ranking outcomes emphasizes a pre-occupation with ordinal ranking and database bias. The abstract follows:
schools, scholars, and journals emphasize ordinal rank. Journal rankings
published by Journal Citation Reports (JCR) are widely used to assess research
quality, which influences important decisions by academic departments,
universities, and countries. We study refereed law journal rankings by JCR,
Washington and Lee Law Library (W&L), and the Australian Research Council
(ARC). Both JCR’s and W&L’s multiple measures of journals can be represented
by a single latent factor. Yet JCR’s rankings are uncorrelated with W&L’s.
The differences appear to be attributable to underrepresentation of law journals
in JCR’s database. We illustrate the effects of database bias on rankings
through case studies of three elite journals, the Journal of Law &
Economics, Supreme Court Review, and the American Law & Economics Review.
Cluster analysis is a supplement to ordinal ranking and we report the results of
a cluster analysis of law journals. The ARC does organize journals into four
large groups and provides generally reasonable rankings of journals. But
anomalies exist that could be avoided by checking the ARC groups against
citation-based measures. Entities that rank should use their data to provide
meaningful clusters rather than providing only ordinal ranks."
Noting that "In the United States, men are fifteen times as likely to be incarcerated as women," scholars have wondered whether this difference "can be explained by differences in criminal behavior or
circumstances, or are courts or prosecutors treating genuinely
equivalent cases differently on the basis of gender?" Given the obvious constitutional implications, reasons for this difference generate important policy and legal interest.
In Estimating Gender Disparities in Federal Criminal Cases, Sonja Starr (Michigan) explores reasons for this difference using a dataset that "traces federal criminal cases from arrest through sentencing. I find that gender gaps widen at every stage of the justice process and that men and women ultimately receive dramatically different sentences." An excerpted abstract follows:
"[the paper] finds large gender gaps favoring women throughout the sentence length distribution (averaging over 60%), conditional on arrest offense, criminal history, and other pre-charge observables. Female arrestees are also significantly likelier to avoid charges and convictions entirely, and twice as likely to avoid incarceration if convicted. Prior studies have reported much smaller sentence gaps because they have ignored the role of charging, plea-bargaining, and sentencing fact-finding in producing sentences. Most studies control for endogenous severity measures that result from these earlier discretionary processes and use samples that have been winnowed by them. I avoid these problems by using a linked dataset tracing cases from arrest through sentencing. Using decomposition methods, I show that most sentence disparity arises from decisions at the earlier stages, and use the rich data to investigate causal theories for these gender gaps."
In the complicated (and turbulent) education policy world, policymakers continue to fret over "gendered" outcomes, particularly, of late, in the STEM (science, technology, engineering, and mathematics) context. While such a topic is admittedly layered, research design issues typically frustrate studies of American schools. In Do Single-Sex Schools
Enhance Students’ Stem (Science, Technology, Engineering, and Mathematics)
Outcomes?, Hyunjoon Park (Penn-Sociology), Jere Behrman (Penn-Econ.), and Jaesung Choi (Penn-Econ.) exploit a unique data set from Korea where students are randomly assigned to either co-ed or single-gender high schools. Also interesting are the study's asymmetric findings. The paper's abstract follows.
"Despite women’s significant improvement in educational attainment, underrepresentation of women in Science, Technology, Engineering, and Mathematics (STEM) college majors persists in most countries. We address whether one particular institution – single-sex schools – may enhance female – or male – students’ STEM careers. Exploiting the unique setting in Korea where assignment to all-girls, all-boys or coeducational high schools is random, we move beyond associations to assess causal effects of single-sex schools. We use administrative data on national college entrance mathematics examination scores and a longitudinal survey of high school seniors that provide various STEM outcomes (mathematics and science interest and selfefficacy, expectations of a four-year college attendance and a STEM college major during the high school senior year, and actual attendance at a four-year college and choice of a STEM major two years after high school). We find significantly positive effects of all-boys schools consistently across different STEM outcomes, whereas the positive effect of all-girls schools is only found for mathematics scores."
While attention has focused on state medical malpractice over the years, attention has shifted away from claims against and payouts by physicians and surgeons. A recent paper by Myungho Paik (Northwestern), Bernie Black (Northwestern), and David Hyman (Illinois), The Receding Tide of
Medical Malpractice Litigation, exploits the National Practitioner
Data Bank and finds that per-physician payouts have fallen 46% below their 1992 level. The abstract follows.
"Tort reform has
been a hot issue during the past decade, as malpractice premiums spiked, and
state and federal legislators debated the desirability of damages caps. Nine
states adopted caps on non-economic or total damages during the period
2003-2006, joining twenty-two states that had previously adopted caps. Great
effort has been devoted to studying the impact of these caps, but overall trends
in claim rates and payouts have been ignored. Using the National Practitioner
Data Bank, we find the frequency of paid medical malpractice claims per
physician has been dropping steadily for almost 20 years, and is now less than
half the level it was in 1992. Payouts per physician have also been dropping
since 2003, and are now 46% below their 1992 level. The decline is largest in
states that recently capped total or non-economic damages, but there are also
large and sustained declines in states with older damage caps and states with no
damage caps. We identify several factors that may partially explain these
trends, and suggest possibilities for further research."