Einer Elhauge is guest-blogging over at Volokh, and raising a bunch of interesting questions. One post draws a parallel between ELS and sabermetrics, leading one (clearly Boston-based) wag to ask, "if empirical legal studies are like sabermetrics, who is the legal equivalent of Joe Morgan?"
I'm not touching that one... But, in the spirit of the (baseball) season, I'd offer the following instead. Empirical researchers are taught early on to beware of selection bias -- drawing incorrect inferences from data which are nonrandomly sampled from a population. (An example would be concluding that early humans lived predominantly in caves, on the basis of finding paintings, remains of fire pits, and so forth there; because such artifacts are more likely to survive in caves than elsewhere, the available archaeological data are subject to selection bias). We also learn that there are ways of dealing with such data, including the class of "selection models" first developed by James Heckman.*
What does this have to do with baseball? As it happens, my partner-in-crime Jeff Gill and I recently published a paper about the political dynamics of public opinion toward the designated hitter rule. (That's the Yankees' Ron "Boomer" Blomberg, the first-ever DH, in the picture). Our paper draws on data from a September 1997 CBS News poll which utilized a "screening" question. Respondents (there were over 1,000 of them) were first asked how interested they were in watching or following baseball. The 440 who responded "very interested" or "somewhat interested" were then asked a series of follow-up questions, including whether they approved or disapproved of the designated hitter; the other 600 or so respondents were not asked the follow-ups.
Our first thought was that the screening question raised the possibility of selection bias in the data, and our initial analyses used a Heckman-type selection model to correct for that. (For those of you desperate for an alternative to grading final exams, an older version of the paper that includes the selection model results is posted here). But one of the reviewers at QJPS pointed out (quite correctly) that, since the population of interest presumably only consists of individuals with some interest in baseball, the screening question posed no threat of selection bias. That is, individuals with no interest in baseball presumably can't have opinions about the DH, and so omitting them from our analysis is not a problem.
I offer this example to underscore a point that often gets lost in researchers' concerns about selection bias: The fact that one's data are somehow subject to a nonrandom selection process does not necessarily mean that selection bias will be the result. Too often, scholars -- including this one -- conflate the two, and rush to implement some sort of correction when in fact it none is needed. Instead, the key question is always whether the selection mechanism leads to bias vis-a-vis the phenomenon of interest, something only a thoughtful consideration of the question(s) and the data can answer.
* Good references on selection bias include Heckman (1979 Econometrica), Berk (1983 ASR), Winship and Mare (1992 Ann. Rev. Soc.), and Stolzenberg and Relles (1997 ASR). Selection bias in a qualitative-data context is reviewed by (among others) Geddes (1990 Political Analysis), King/Keohane/Verba (Designing Social Inquiry 1994, ch. 4), and Collier and Mahoney (1996 World Politics). More recently, Jonathan Koehler of the UT-Austin Business School has done some interesting experimental work on selection bias in a jury context.