Thanks to all of you who have commented so far. To keep the discussion going, I want to look a little more deeply at a comment Michael Heise made, namely that one reason for a possible federal bias is data quality: federal data is often superior to state data, and "high quality empirical work requires high quality data." I agree completely. Yet there's a "but..." I can't quite shake from my mind.
What do you do when the only data you have for the question you're confronting are incomplete -- they're well-gathered but, say, missing a covariate or two (as often appears to be a problem of state data as compared to federal)? This is particularly a problem if the question is one that legislatures are wrestling with and for which they need an answer sooner rather than later.
The obvious answer, of course, is that you do the best you can with the data you have. You of course have to acknowledge the limits of the data, such as pointing out the flaws, restricting the questions to those the data can best handle, and so on. But is this right? Do studies that rely on weak data do more harm than good? And if they do, what alternative options do we have?
There is, of course, plenty of (sometimes anecdotal) evidence about how the poor use of data can do more harm than good: how failing to control for the endogeneity of policing hiring led people to believe that more police had no effect on crime rates (or in fact drove them up), or how failing to correct for the endogeneous relationship between prison population and crime similarly led to arguments that prison length was immaterial. I'm sure everyone reading this blog has examples from their own fields where unacknowledged self-selection, endogeneity, autocorrelation, or other treatable ailment led to problematic policy suggestions.
But what is the effect on public policy of sound methodology applied honestly to weak data that are the best available? Do the results cluster around the "true" outcome, just much more noisily than studies with better data? Or are they going to be systematically biased in some direction, and if so, can we even predict in what direction that will be (we can expect, for example, that they will support the null hypothesis that a given policy change will have no effect too often)?
Moreover, how can we expect the actual policy makers -- the legislators or the agency heads -- to handle these results? Even if such studies circle around the "true" outcome, do the inherent limitations in the data (and the results) imply that less-sophisticated consumers of empirical work (or even sophisticated ones) will misinterpret them in systematic ways?
And yet, even if all that is true, is the alternative better? Without even the weak studies, properly disclaimed, the legislators have nothing to go on but their gut instinct. Is this even more biased? Or can they selectively misinterpret weaker work more easily than sounder to augment these biases, so that weak empirical outcomes interact with gut instincts to magnify them?
States are playing increasingly important roles these days, it seems. And as several people noted in comments the other day, state data are often less well-organized and less reliable. I think the Hippocratic Oath is just as applicable to empirical work as to medicine: "First, do no harm." Are we?
John,
This is a very good post. To my mind, your discussion essentially answered the question posed: Use the best available data, "properly disclaimed." (In my next paper, I am going to drop a footnote and give you attribution for that phrase.)
Here is a simple plan: (1) Identify the weaknesses in the data and check your assessments with an objective third party; (2) Explore possible methodological solutions (new methods often arise in response to data problems); and (3) if the results are amenable to alternative intepretations, some of which cannot be tested due to data limitations, clearly state that to be the case. Boom, we know more about the world.
Sure, a reader might seize on a statistically significant result and ignore unambiguous disclaimers on what that result might mean (it has happened to me). But why accommodate irresponsible or ignorant behavior? What is more troublesome is the prospect that some agency will not collect better data the next time around because a study getting us half-way there, and flagging the data issue, was never published.
In short, do honest work that assists the competent researcher, legislator, or lay reader.
Posted by: William Henderson | 07 April 2006 at 11:23 PM
I wonder if there's also an inclination to aggregate data rather than divide it out and look at state-specific problems. For example, my understanding is that Congress passed the Violence Against Women Act because of a sense that some states weren't doing enough on this problem, but the data Congress put in the record mostly seemed to be about the total number of assaults and prosecutions, rather than targeting specific states for underenforcement. So we end up with Congress's dealing with traditional state issues, and the law journals helping them out by not looking at states on an individual basis. http://www.blogdenovo.org/archives/001274.html
Posted by: PG | 07 April 2006 at 04:55 PM