Is there a debate over the various measures of inter-coder reliability? For a short discussion, as well as a practical guide for testing inter-coder reliability in content analysis, I found this: Lombard, Snyder-Duch & Campanella Bracken, Practical Resources for Assessing and Reporting Intercoder Reliability in Content Analysis Research Projects. I've used the more conservative Cohen’s kappa to measure inter-coder reliability. See Landis & Koch, The Measurement of Observer Agreement for Categorical Data, 33 Biometrics 159 (1977). But, is there a preferred measure? If so, why? Is simple percent agreement always disfavored? Is there an article that thoroughly compares the options? Comments are open.