I recently taught a seminar in which students looked at some aspect of federal civil trial trends, and wrote an empirical paper. In the course of preparing for that seminar, I put together a list of both raw and processed data sources on outcomes in state and federal courts. That list is here. The most important such source is the filing and terminations data collected by district court clerks' offices and then sent to the Administrative Office of the U.S. Courts. The AO publishes an annual report of summary statistics, available (since 1997) here, and issues the raw data, year by year, via the ICPSR. (The study numbers are: 1970-2000: Study 8429; 2001: Study 3415; 2002: Study 4059; 2003: Study 4026; 2004: Study 4348.)
The AO's data cover only very basic facts about a case: its filing date; its docket number and court; the "nature of suit" according to the clerks' office transcription of the "civil cover sheet" filled out, usually, by the plaintiff's lawyer; at what procedural point the case ended (prior to trial, during trial, after trial, etc.); the nature of the judgment (settlement, dismissal, etc.); who won (if anyone did); the amount of the judgment.
The observations in the raw data files are for each case "termination" in the relevant statistical year. (In addition, there's a raw data file for any cases still pending in the district courts, with information as of the time of filing.) So for many research questions, the first step has to be to combine all the different years of data, in order to look at time trends, or in order to use date of filing, rather than date of termination. Ted Eisenberg and Kevin Clermont have done us all a public service by doing that combination on-line and making custom summary statistics available here. In cooperation with Eisenberg and Clermont, the new Center for Empirical Research in the Law, run by my colleague Andrew Martin, will soon be posting a fancier version of the same data. In the meantime, however, anyone interested in working with the raw data (in Stata format) can download it all together, from this page of my site (see I.A.2).
Lots of interesting work that can be done with the AO data--but the limits are quite extreme. First, the data are quite difficult to work with because the variables are never exactly what you'd want (there's no coding for summary judgment) or are pretty noisy (what is "dismissed: other" and why is is used so much more often in some districts than in others?). In addition, the AO has stripped out the names of the judges, as well as the case captions, which makes the data harder to use. More generally, as in so many public datasets collected by a bureaucracy for its own reasons, each variable is a little odd, and requires a bit of investigation prior to use. Some variables are highly reliable (who won seems to be in this catagory, for example); others less so. The data can be audited pretty effectively, since 1993, using the federal court's electronic docketing sytem, PACER. I did a topic-specific such audit in an article about inmate litigation, in 2003; Ted Eisenberg and I wrote a more systematic paper on the results of auditing that same year; and Gillian Hadfield has written some others.
Two things would make the dataset vastly more useful. The first is if the judges were added back in. This has been extremely difficult previously, and had to be done one case at a time, using PACER. But Christy Boyd, a political science student here at Wash. U., has come up with a vastly better method. The second is if the data collected lined up a bit better with the kinds of questions researchers actually cared about; if, for example, there were variables within the nature of suit "civil rights: employment" for whether the case concerned race, sex, disability etc. In order to make commenting easier, I'm going to start two posts immediately after this one. One will describe Christy's method, briefly, and link to her fuller description of it. The other will ask for your thoughts on how the AO data might be improved; I think it would really be great for a group of scholars who use these data to try to meet with folks there to talk about ways to make the data more useful. If you have any ideas on this, I urge you to post them.