« Peer-Reviewed versus Student Edited Journals | Main | Judicial Selection Scheme Unconstitutional? »

28 February 2006


Mark Hall

I came across another helpful discussion, from James F. Spriggs, II and Thomas G. Hansford, Measuring Legal Change: The Reliability and Validity of Shepard’s Citations, 53 Pol. Res. Q. 327, 334 & n.11 (2000):

"Kappa means that the level of agreement is [that] percent greater than would be expected by change and thus indicates. . . . If Kappa equals 0 then the amount of agreement between the two coders is exactly what one would expect by chance. If Kappa equals 1, then the coders agree perfectly. When evaluating the extent to which the two coders agree, Landis and Koch (1977) [Richard J. Landis & Gary G. Koch, The Measurement of Observer Agreements for Categorical Data, Biometrics 33:159-174 (1977).] attach the following labels to the size of the Kappa statistic: <0.00 is Poor; 0.00-0.20 is Slight; 0.21-0.40 is Fair; 0.41-0.60 is Moderate; 0.61-0.80 is Substantial; and 0.81-1.00 is Almost Perfect."

Mark Hall and Ron Wright

We came across an article by two political scientists that says the "pi coefficient" is the most common realiability measure, especially for more than 2 coders. It also says that a pi value of 0.6 is considered minimally acceptable in communications research. Glenn A. Phelps & John B. Gates, The Myth of Jurisprudence: Interpretitve Theory in the Constitutional Opinions of Justices Rehnquist and Brennan, 31 Santa Clara L. Rev. 567, nn. 69, 74 (1991).

Ken Cousins

We've been trying to address similar issues; it seems one of the first questions you need to answer is about the nature of the coding. For dichotomous coding, I've seen suggestions that Pearson's R may be sufficient.

Lombard et al., wrote a similar piece to the one you've cited a few years ago:

Lombard, M, J Snyder-Duch, et al. (2002). "Content Analysis in Mass Communication: Assessment and Reporting of Intercoder Reliability." Human Communication Research 28(4): 587-604

At any rate, I'd be interested to see what you all find, and will check back if I find other useful sources.

Mark Hall

Good question, and that website is a great resource. Simply reporting percent level of agreement is not sufficient because it tells us nothing about whether the level of agreement is greater than what one would expect by chance. In typical content coding, some items are so straightforward there should be no disagreement, so any substantial disagreement is a sign of problems. Or, there is often a long list of factors to code for, most of which are not present in most or many cases. So it's easy to produce a high level of agreement overall, with coders almost always indicating "not present," but the key question is the level of agreement when one or more coder finds the factor present. Tricky stuff to get right.

I recently came across the following article, which has a good but technical discussion of these issues in the context of coding legal cases: Charles A. Johnson, Content-Analytic Techniques and Judicial Resesarch, 15 Am. Politics Q. 169-197 (1987).

At the end of the day, I'm also still not sure how best to do this. Like much of statistics, I suppose it's as much "art" as it is science.

The comments to this entry are closed.


December 2014

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Site Meter

Creative Commons License

  • Creative Commons License
Blog powered by Typepad