Many of my recent projects have required quite a bit of data coding. My review essay in this month's issue of the Texas Law Review, for instance, required the review of approximately 20,000 pool memos from the Blackmun papers in order to evaluate the characteristics and possible influence of the Supreme Court's cert pool. Another project, which I hope to co-author with Tim Johnson (University of Minnesota Political Science Department), will examine the factors that influence the certiorari votes of each of the Justices on the Court during the latter part of Justice Blackmun's tenure on the Supreme Court (1986-1993). Finally, I have been collecting data for almost a year for an article examining the reasons behind the Supreme Court's declining plenary docket. Among other things, the latter two papers have required coding the cert votes of every Supreme Court Justice in every case (with at least one vote to grant or join-3) from 1986-1993. The data set includes between 2,000-3,000 cases, and approximately 20,000 total votes.
The amount of data collection required for these various projects is extensive, and I can now say that the collection and coding of the data has been one of the most difficult aspects of empirical research. The first challenge, of course, is figuring out exactly how you want to code the data, and which variables should be included in the data set. Without careful consideration, you may quickly find yourself well into the project, and having to review and reenter much of the data because an important variable has been unintentionally omitted. I have found that, when examining and studying courts, the importance of some variables is often not self-evident, and thus it pays to be overinclusive in the amount of information and number of variables coded on the front end.
The second challenge is competence. Law students, unlike graduate students in the social sciences, are not trained in data coding. Very few have quantitative backgrounds, and even fewer have experience working with large data sets and statistical packages such as SPSS and Stata. As a result, my empirical research has required me to train and review the work of my research assistants quite closely, which takes time away from the research itself. Thankfully, I now have a research assistant that has become quite proficient at coding and interpreting data, but I am coming to grips with the fact that he graduates at the end of this year and I have yet to find a suitable replacement. Again, unlike graduate students, law students are barred from working for professors during their first year, and thus it is only possible to work with them for a maximum of two years, hardly enough time for them to gain competence and speed in data coding. One possibility, of course, is to hire graduate students in the social sciences, but for some reason beyond my understanding, it is far more costly to hire research assistants from outside the law school (as much as three times as expensive). I still have yet to find a suitable answer to the second issue, and I wonder how my colleagues in the legal academy deal with this question. Comments to this post are therefore most welcome.