Many of my recent projects have required quite a bit of data coding. My review essay in this month's issue of the Texas Law Review, for instance, required the review of approximately 20,000 pool memos from the Blackmun papers in order to evaluate the characteristics and possible influence of the Supreme Court's cert pool. Another project, which I hope to co-author with Tim Johnson (University of Minnesota Political Science Department), will examine the factors that influence the certiorari votes of each of the Justices on the Court during the latter part of Justice Blackmun's tenure on the Supreme Court (1986-1993). Finally, I have been collecting data for almost a year for an article examining the reasons behind the Supreme Court's declining plenary docket. Among other things, the latter two papers have required coding the cert votes of every Supreme Court Justice in every case (with at least one vote to grant or join-3) from 1986-1993. The data set includes between 2,000-3,000 cases, and approximately 20,000 total votes.
The amount of data collection required for these various projects is extensive, and I can now say that the collection and coding of the data has been one of the most difficult aspects of empirical research. The first challenge, of course, is figuring out exactly how you want to code the data, and which variables should be included in the data set. Without careful consideration, you may quickly find yourself well into the project, and having to review and reenter much of the data because an important variable has been unintentionally omitted. I have found that, when examining and studying courts, the importance of some variables is often not self-evident, and thus it pays to be overinclusive in the amount of information and number of variables coded on the front end.
The second challenge is competence. Law students, unlike graduate students in the social sciences, are not trained in data coding. Very few have quantitative backgrounds, and even fewer have experience working with large data sets and statistical packages such as SPSS and Stata. As a result, my empirical research has required me to train and review the work of my research assistants quite closely, which takes time away from the research itself. Thankfully, I now have a research assistant that has become quite proficient at coding and interpreting data, but I am coming to grips with the fact that he graduates at the end of this year and I have yet to find a suitable replacement. Again, unlike graduate students, law students are barred from working for professors during their first year, and thus it is only possible to work with them for a maximum of two years, hardly enough time for them to gain competence and speed in data coding. One possibility, of course, is to hire graduate students in the social sciences, but for some reason beyond my understanding, it is far more costly to hire research assistants from outside the law school (as much as three times as expensive). I still have yet to find a suitable answer to the second issue, and I wonder how my colleagues in the legal academy deal with this question. Comments to this post are therefore most welcome.
Just a note to be sure you are aware of Spaeth's new database, which includes a sample of cert denials -- you may be able to avoid coding cases altogether!
The database can be found here:
http://www.as.uky.edu/polisci/ulmerproject/sctdata.htm
(The Expanded Burger Court Database is the relevant one.)
Posted by: Sara Benesh | 17 April 2007 at 09:43 AM
Here are two articles that discuss the difficulties in, and techniques for, training law students to code reliably:
Reed C. Lawlor, Fact Content Analysis of Judicial Opinions, 8 JURIMETRICS J. 107, 109-10 (1966-1968).
Charles A. Johnson, Content-Analytic Techniques and Judicial Research, 15 AM. POLITICS Q. 169 (1987).
They do not, however, disucss how to find and retain good coders.
Posted by: Mark Hall | 11 April 2007 at 04:38 PM
As a law student and a graduate student pursuing a master's in economics who is interested in empirical work myself, I would suggest that you utilize grad students from your university's world renown economics department. Although the cost is apparently hiigher, you are certain to get a student well trained in data analysis and interpretation (and probably one with knowledge of software programs).
Posted by: Josh | 10 April 2007 at 03:21 PM
On large projects (3+ RAs) I have solved the problem by hiring one social science graduate student to be the "super-RA." S/He helps me design the data collection and then supervises the law students for/with me. In this situation the additional cost of the grad student is a marginal bump on the total RA budget. It gives me extra time to do more substantive work, and is good training for them. As for why they cost more, is it possible that you've got a fee remission hidden in the cost?
I have also had large projects where it is impractical to hire a grad student, and my experience is identical to yours. Much supervision, bad data, etc. There's no substitute for talent.
Posted by: Joe Doherty | 06 April 2007 at 12:09 PM