« New York Police Department (NYPD) Stop, Question, and Frisk Database, 2006 | Main | ABA's Litigation Research Fund -- RFP »

June 17, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451b58069e200e5535a8af38833

Listed below are links to weblogs that reference What Not To Do, Part II: Stars... :

Comments

Tracy Lightcap

It's not bad, but I'm old school: I want to see some measures of fit. Especially when the N is below 200. If you have survey data with 1K or so cases, fit will always be pretty poor and you should go directly to the coeffs to see if they have substantive effect (imho; significance is important, but secondary). But here I'd want something to see if there is any juice in the equation.

I also don't like the footnote for the indicator descriptions. Straightforward short descriptions would be much better, even if you had to (horrors!) use two lines to do it. This is all SPSS's fault; we've gotten lazy about using the cryptic indicator values we plug into it.

A final complaint: we have to assume here that usual frequentist tests are legit. When I do this kind of thing, I usually report the standard tests; everybody at the journals seems to want them. I always check those readings by bootstraping the model, however. We really need to get the reviewers to start taking bootstraped errors more often; given the kinds of "samples" social scientists work with these days it's the only legit route.

Which would lead us into a discussion of significance testing tout coup, I suppose.

William Ford

We had a blog forum on significance testing in early 2007. Unfortunately, there is no category tag that pulls up the old posts on this topic. Maybe we need one.

Here are the posts I could find:

http://www.elsblog.org/the_empirical_legal_studi/2007/01/blog_forum_sign.html
http://www.elsblog.org/the_empirical_legal_studi/2007/01/a_brief_overvie.html
http://www.elsblog.org/the_empirical_legal_studi/2007/01/the_uses_of_sig.html
http://www.elsblog.org/the_empirical_legal_studi/2007/01/the_social_scie.html

It would be worth revisiting the topic with another blog forum.

C. Zorn

Christopher: True as well. I'm very picky with my own Ph.D. students when it comes to making them state explicitly whether their hypotheses are directional, and (so) the "tailedness" of their tests. But I also agree that confidence/credible intervals are generally better.

As for the general subject, perhaps we can/should hold a little blog forum on that topic in the future...

C. Griffin

Given that most (if not all) researchers use pre-programmed estimators to generate their empirical results, the reader should probably assume that the significance test is two-tailed. It seems to me, along the lines of Jeremy's comment, that what we should discuss more is the relative significance of statistical significance. Even if a point estimate passes the p < 0.5 test, it may still be a poor estimate if the standard error is large enough. Although it requires more table space, I would like to see more use of confidence intervals rather than SEs and asterisks so that the reader can easily determine the precision of reported estimates.

C. Zorn

Reasonable points, Jeremy. Actually, this table is a bit unusual in going the P=0.10, P=0.20 route. The more common means of proliferation for stars is along the lines of "one asterisk indicates P<.05, two indicate P<.01, three indicate P<.001, four indicate P<.0001, five indicate P<.00001, etc."

As for "tails," it seems easy enough to note e.g. "(one-tailed)" after the description of the P-values. And, that's consistent with the general rule that a table should "stand on its own" whenever possible.

Jeremy A. Blumenthal

A couple quibbles and a request . . . The request: please don’t leave for too long a discussion of the problems of over-emphasis on statistical significance! The more (and more often) we note the potential shortcomings of focusing ONLY on p-values, I think, the better.

Which relates to one of the quibbles—it’s certainly possible that an effect can be significant at p-values higher than .05, but for that reason be reflexively dismissed despite being potentially interesting. Thus, I don’t know that it’s necessarily a problem to report p-values of <.10, though perhaps .20 is stretching it a little. (Whether it’s necessary to do it in the table is another question.) The second quibble is on the tailed-ness point: no question an author should make clear whether a 1- or 2-tailed test is being reported, but that seems appropriate to put in the text, rather than a table, for reasons of (1) clutter and (2) the opportunity to explain why the particular test is being used.

The comments to this entry are closed.

Conferences

April 2014

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Site Meter


Creative Commons License


  • Creative Commons License
Blog powered by Typepad