While generally not news, academic squabbles over coding decisions, including coding decisions involving leading data sets, become news when they spill into the academic literature. A current example involves the Supreme Court Database and its use in a paper by Lee Epstein (Wash U), Christopher Parker (Centenary College), and Jeff Segal (SUNY Stony Brook).
In Do Justices Defend the Speech They Hate? In-Group Bias, Opportunism, and the First Amendment, Epstein et al. exploit the Supreme Court Database, use a "two-level hierarchical model of 4,519 votes in 516 cases," and find that "although liberal justices are (overall) more supportive of free speech claims than conservative justices, the votes of both liberal and conservative justices tend to reflect their preferences toward the speakers' ideological grouping, and not solely an underlying taste for (or against) the First Amendment." (The paper's findings made their way into a New York Times article.)
Another scholar, Todd Pettys (Iowa), however, took issue with Epstein et al., particularly their reliance on the some of data set's coding decisions. In Free Expression, In-Group Bias, and the Court's Conservatives: A Critique of the Epstein-Parker-Segal Study, Pettys digs deeply into the underlying data set and writes: "In a recent, widely publicized study, a prestigious team of political scientists concluded that there is strong evidence of ideological in-group bias among the Supreme Court’s members in First Amendment free-expression cases, with the current four most conservative justices being the Roberts Court’s worst offenders. Beneath the surface of the authors’ conclusions, however, one finds a surprisingly sizable combination of coding errors, superficial case readings, and questionable judgments about litigants’ ideological affiliations. Many of those problems likely flow either from shortcomings that reportedly afflict the Supreme Court Database (the data set that nearly always provides the starting point for empirical studies of the Court) or from a failure to take seriously the importance of attending to cases’ details."
Responding to Pettys' critique of their work, Epstein et al. write: "The upshot is that our coding procedures reject over 80% of the author's allegations, meaning his critique reduces to about 2% of our coding decisions, extrapolating over his non-random audit percentage. Although we think a few of these are debatable, we are happy to concede, and have corrected the dataset to reflect the changes he desires. We have also rerun the analysis. The results do not change in any substantively or statistically significant way (see Appendix A), nor do the results of the summary of our study reported in the New York Times and other outlets (see Appendix B)."
Regardless of who has the better of this particular argument, greater attention to coding conventions and the underlying accuracy of any data set (particularly influential data sets such as the Supreme Court Database) is welcome.