I've been a regular reader of the blog since its inception. I don't know how much actual posting I can contribute -- I have found some of the greatest value from the blog comes from our guest commentators, such as the current discussion of statistical significance.
While I have for some time agreed with the commentators' contentions, I would add one perspective from an institutional/legal approach. Any decision rule, be it a law or regulation or standard for evaluating research is going to be inaccurate. It can't take into account the diversity of circumstances to which the rule would be applied. And the reason for having such a rule is that it will still be more accurate than the free-for-all of unrestricted case-by-case evaluation. While the statistical significance rule is substantially flawed, what would be better? Professor Henkel has provided valuable explanations of how the standard is misused inferentially, but I wonder if there is not a better standard (rule) that could be deployed for research. I think the overemphasis on statistical significance will continue until a better rule is designed.
Gary King has been carrying on a long crusade along the lines Jeremy suggests, with some success. The line of thought stretches all the way back to John Tukey's "Badmandments" in the early '60's.
I wish we could switch over more to resampling stats myself, as I have outlined in comments below. But we might as well admit that this really isn't a scientific controversy; it's a matter of professional boundary maintenance. Knowing and understanding the basics of Fisher-style statistical inference (it does, after all, take at least two tries) is a major rite of passage in the social sciences and the continuation of the "standards" that accompany it are unlikely to be overturned by any controversy over their meaning any time soon. There is simply too much invested by too many people in maintaining the status quo.
Still, it would be intellectually dishonest to avoid questions about our professional practices, albeit that we are unlikely to change them that way. The increasing sophistication of methodological education will put paid to some of the bad practices as time progresses and, as Kuhn points out, the adherents of the older ideas die out.
Posted by: Tracy Lightcap | 31 January 2007 at 11:00 PM
In psychology at least, critics of p-value significance testing have long focused on two "alternatives." The first, confidence intervals, might be thought of as in the same family; the second, though - effect size (r, d, phi, etc.) - is less dependent on sample size and gives, in my opinion, a better sense of the strength of a finding. It also, arguably, can give a better sense of a finding's "importance," though I recognize the loaded and variable nature of that term.
Posted by: Jeremy A. Blumenthal | 31 January 2007 at 02:33 PM