January 24, 2008



Bill Henderson


I concede your first point, but don't understand your second one.

1) When I wrote "proprietary" I was thinking about the contractual provision between LSSSE and the law school that LSSSE will not reveal a school's score without the school's consent. But I suspect that that LSSSE would be subject to a Freedom of Information Act (FOIA) request and/or the state law analogue. If someone systematically collects public law school's LSSSE data and publishes it, it provides less incentive to participate. One solution is to make LSSSE mandatory and transparent as a condition of accreditation. I would be all for that.

2) What empirical work have I "published" in relate to LSSSE? The point of the post is to encourage schools to use LSSSE data to analyze their schools internally, not to showcase Indiana's LSSSE scores -- indeed, I point out that they are low on some dimensions. The aggregate level data is calculated by LSSSE, including the t-test results. I generated the differences (displayed on the bar charts) using an excel spreadsheet, with my assistant plugging in the data. But again, that is just to generate visual graphs. Getting the data that generated the t-test results is impossible (as a practical matter) because LSSSE only has the right, under a contract with 80+ institutions) to release aggregated results.

Re your point on "making the underlying database publicly available." I think you are painting with too big a brush. I know lots of PhDs who are willing to share their data ... after they are done with it. There needs to be some incentive to create original data, which comes in the form of the right to exclude. Further, some of the best data available these days is (really) proprietary because it was created by private industry; but for a licensing agreement, no one would have access to it.

In one of the licensing agreements I work under, there is a provision that permits making data available to journal editors to re-run regressions (e.g., using Stata ado files). The burden is on us to negotiate with limited use with journal editors. But sharing the dataset with a third party is forbidden. Another possibility is posting regression results on the Internet; everyone can look at multiple specifications.

The flipside is externally funded datasets; if someone gets a grant from the NSF, then making it publicly available is a condition of the fundings. The quid pro quo is that others don't get access until you publish the first paper and you are compensated for the time and effort to create the resource.

Licensing issues and incentive structures make it hard, in my opinion, to support a bright line rule for posting data. bh.

Theodore Seto

I am intrigued by the premise that the performance of a state-subsidized school is proprietary information.

I am also intrigued by the premise that it is credible to publish empirical work without making the underlying database publicly available. This is not so in any other field. The possibility of replication is essential to credibility.

Christopher Zorn

"When empirical data is presented that challenges a well entrenched view, law professors query whether the sample or methodology can really be trusted--lack of statistical knowledge is rarely an impediment to this line of objection. This is a great mindset if we are engaged in a winner-take-all adversarial contest. But it not the right approach for building a great institution."

Nor, I would add, for intellectual inquiry, at least not that of an empirical sort. But since I've blogged on this before (at http://tinyurl.com/39xy7v ) I'll get off my soapbox.

