« Why do Judges Dissent? - Part I | Main | Comparative and Transnational Courts »

22 March 2006


Jason @ Beaneball

The last point by Stephen Wasby is what I regard as the most important: scholars always have big ideas and big dreams, but reality sometimes gets in the way. Shouldn't there be some idealism here, some sense of, "I'd rather we get to the truth of the matter here than shoot purely for the credit."

Stephen L. Wasby

(1) The P.S. symposium to which Sara refers deals with most of these issues. (2) A point missed in the comments is that journal editors in political science require that access to the datasets (for those who wish to "check" and re-run the data), not just identification of existing ICPR data sets, be allowed as a condition precedent to publication. (3) The Heise and Henderson comments about "I did the damn work, I want to use my data" miss a key point: if someone looks at your database, unless that person has considerable skill, that person's paper based on your database won't be accepted for publication; perhaps more important, you probably won't have collected exactly the data that the other person wants -- some variables will be missing, so considerable additional work will still be necessary. In short, I think the "someone else will use my data" argument is overstated. And the tendency will be to go to the ICPR-type datasets, not to an individual author's dataset (after all, no offense intended, how can we be sure you "did it right" rather than some sloppy RA having miscoded?). I would end with a comment that cuts in a different direction: If your data set is substantial, there will be more than enough "stuff" there for others to use, and I would want to see others use it than have it sit in my basement, or my computer, or wherever -- because none of us ever gets done with data we collect what we hope (and say) we will do.

William Henderson

I agree with Joe Doherty. Data collection and cleaning is time consuming work. The expectation I most commonly heard prior to this past year or so is that "others can have access to my data when I am done." When an outside organization pays for the data creation (like the NSF or a foundation), then they typically stipulate a liberal data release policy as a condition of funding.

Re veracity of results (which is essentially how the term “replication” is being used in this thread—not replication of a controlled experiment), it is certainly reasonable for a journal editor to ask for copies of regression results, or possibly alternative specifications of a model, or some concrete evidence of how a variable was coded.

However, immediate full disclosure of a database with lots of sweat equity is counterproductive for the incentive reasons set out by Michael Heise. Until the value of a privately constructed dataset has been largely exhausted, an author need only disclose where / how the data was collected. If I am going to work late, blow my RA budget, spend dozens of hours locating data from arcane library sources, write letters and emails to cajole cooperation from various institutions, deal with IRBs, and generally neglect my family to clean data on weekends, the rewards need to be commensurate with the effort. One dataset is usually good for three, four, or more papers. I’ll be damned if I am going to willingly hand papers two, three or four on a silver platter to a less industrious colleague.

It is not virtuous to disregard the basic economics of intellectual property--if an asset becomes a public good, it will be undersupplied by the market.

Joe Doherty

I disagree with the conclusion to Michael's statement,

"[b]ecause replication is central to the empirical scholarship enterprise scholars owe some duty to facilitate replication by others. Such a duty, of course, necessarily implicates access to data (and, as I discuss below, possibly to coding as well)."

While data disclosure is a public good that should be encouraged [I am a rabid consumer of readily available data] the OBLIGATION of a scholar to facilitate replication should extend no further than an exact statement of the procedures employed in data collection and analysis. I grant that there are exceptions -- every bit of data collected with government funds should be disclosed upon publication -- but I do not believe that scholars are ENTITLED to the data used by others in their research. [Excuse the caps, I can't make italics.]

I say this because to assert such a duty only legitimizes a problematic shortcut. In my experience more than one-half of the research effort takes place in the data collection phase; running a regression is relatively nominal compared to the decisions that are made in coding. What proponents of data disclosure are proposing is a duplication of the statistical tests, not a replication of the empirical research. While duplication is important if the results are suspect (Donohue and Wolfers on capital punishment is a case in point), we don't advance the ball very much if we are not equally or more concerned with measurement. (Donohue and Wolfers did fine, by the way, in collecting their own data to replicate research when the data were not forthcoming from the authors.)

Having said that, I do believe that data disclosure is a significant social good. The pedagogical importance of having relevant ELS datasets available for coursework should be uncontroversial, and using existing datasets as foundations for additional data colletion is undeniably worthwhile. And there are great incentives to making data public; it probably increases one's citation count. But a professional duty to disclose places an obligation upon the scholar that is not balanced by the entitlement of the reader.

Sara Benesh

Sorry -- that link isn't working. Try here:

Sara Benesh

This is an important and consequential debate. PS: Political Science & Politics published a forum on the topic a few years ago that can be read here: http://gking.harvard.edu/projects/repl.shtm.I think it provides some useful perspectives on the question.

I, myself, am a strong proponent of what Michael calls the "total disclosure" position. I undestand the desire to get what you can from your own data before sharing it with the world, but when folks are overly cautious in sharing their data, it makes me suspicious -- what are they trying to hide??

The comments to this entry are closed.


February 2020

Sun Mon Tue Wed Thu Fri Sat
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29

Site Meter

Creative Commons License

  • Creative Commons License
Blog powered by Typepad