As legal scholarship becomes increasingly empirical, legal scholars need to become more aware of and sensitive to relevant scholarly norms that inform empirical work. One underexamined norm relates to expectations concerning data availability and the facilitation of replication. Because replication is central to the empirical scholarship enterprise scholars owe some duty to facilitate replication by others. Such a duty, of course, necessarily implicates access to data (and, as I discuss below, possibly to coding as well). The nature, extent, and contour of that duty, however, are far from clear and, indeed, variation distinguishes various disciplines. Two common scenarios illustrate some of the complexities and nuances.
One scenario involves publicly available data, such as those managed and archived by ICPSR at Michigan. For legal scholars that use such data in published work, one obvious expectation (indeed, requirement) is to identify the specific dataset by its ICPSR number in a footnote (or table note). However, is mere dataset identification enough? For example, because some amount of data preparation and manipulation (e.g., collapsing, filtering, re-coding) is almost inevitable, should legal scholars also be expected to make available their coding? Similarly, should table-specific coding be made available by authors? To do so would reduce the burden on subsequent scholars enormously and facilitate replication, follow-up analyses, etc.
A second scenario is even more delicate as it involves original datasets generated by scholars and not publicly archived. To be sure, it is difficult to overstate the effort necessary to develop a first-class dataset. In light of a scholar's often consider investment of sweat equity and because quality datasets frequently support multiple articles, what obligations attach to scholars in terms of facilitating replication efforts by others? On the one hand, a mechanical requirement to release the complete dataset, including data not yet published on (a "total disclosure duty" position), would likely deter dataset building efforts, at least at the margins. At the other extreme, general scholarly norms would assuredly resist a "no duty to disclose" position as some amount of disclosure and data availability are necessary so that findings can be vetted and knowledge advanced. To be sure, considerable middle ground separates these two polar positions.
I welcome thoughts and perspectives on how empirical legal scholars should resolve these (and related) issues and navigate through this uncertain and evolving terrain.