« Social Science Statistics Blog | Main | The social science controversy over significance testing »

30 January 2007


Lawrence Mohr

Certainly, Professor Henkel is right. When I said that a causal connection can "legitimately" be inferred on the basis of a significant result I meant legitimate in terms of accepted practice, not mathematics. The idea of legitimacy here is to contrast the case of randomization with other, nonrandom designs.

Ramon Henkel

As a very belated comment on Professor Mohr's post that occurred to me after my initial comment,I disagree with a statement Professor Mohr makes about the interpretation of a "significant" result. The statement is the following: "If a test is carried out with a significant result, it tells us that there is less than, say, a 5% chance that a randomization vagary alone would yield a difference as large as the one we observed. In that case, we can legitimately infer that the cause of at least some of the observed difference was the experimental treatment, i.e., that the causal connection is greater than zero." Why I disagree is that 100% of the sampling distribution is the result of chance ALONE if the null hypothesis is true. It is only our sense of credibility that then makes the leap to the conclusion that such a "significant result", in other words, rare event, must be the result of something other than chance. Put another way, "Why should the unlikely happen to me?" There is nothing in the mathematics of significance testing that says a significant result is anything other than the result of chance
In response to Professor Lightcap's comments regarding bootstrap sampling, my knowledge of bootstrap sampling is at the level of knowing the technique exists and how it is implemented, not much more. It may have considerable utility in the situations in which decision theory is applicable, but what it contributes, or may contribute, to investigations of the validity of theory is something I have only vague ideas about and they are mostly negative. One website illustrates the nature of at least one of the concerns that underly my uneasiness in that it indicates that an assumption underlying the use of the technique is that "Your sample is a valid representative of the population", whatever that may mean. If it means a miniature version of the population, or something close, then one still has the problem of basing the results on a very tenuous assumption when employing the technique in theory validation. (the URL is long, so probably will require some pasting http://people.revoledu.com/

Lawrence Mohr

This is a response both to Henkel and Lightcap. I must apologize for not making myself clear in the post. I was supporting the position that a significance test can legitimately be used as a measure of strength. What I didn't emphasize strongly enough is that this use has nothing whatever to do with statistical inference based on the test.

When a test is run, you get a number -- a significance or probability level -- and in most cases you then go on to make an inference. If the result is significant, the inference would usually be that the population or causal parameter is not zero. In this use as a strength measure we don't take that second step. We stop with just the information that the result observed is "significant."

In context, this means that if the observation were based on random sampling or randomization, which it is not, it would fall into, say, the 5% tail of the appropriate sampling distribution. This information is then used simply and only as a basis of comparison, not for any inference. Shifting our sights now to that sampling distribution, we know that the relationship in the hypothetical observation just like ours but based on randomization has a certain strength, label it "strong". Everything else being equal (mainly sample sizes and variances), it is necessarily stronger than a relationship not significant at the 10% level and weaker than one that is significant at the 1% level.

By comparison, we can now say that our observed relation in the non-randomized study is "strong", meaning that if it had been a randomized study in would have been "strong" and that it is stronger in this sense than, everything else being equal, a similar result not based on randomness that was not significant at the 10% level, and weaker than one significant at the 1% level -- where the term "significant", again, does not carry the usual implication of proceeding to some stage of inference about a population or about causality. We have only a measure of strength, comparable to a correlation coefficient or a standardized beta coefficient.

My claim, however, is that its near universality or ubiquity makes it exceptionally useful -- and legitimate as long as the "everything else equal" caveat is observed or managed with proper discounting of differences. I gave one example in the post of comparing regression coefficients and Kendall's tau-b's. Another example is sorting out the value of looking further into each of a whole lot of chi-squares (without randomization), where the chi-square values themselves tell us little but the significance levels provide just the sort of exploratory handle we need. And I further claimed that the test is wittingly or unwittingly used in this way overwhelmingly often in the analysis of data, and the use is justified if it is exploratory and used with a good consciousness that sample sizes and variances make a difference.

Tracy Lightcap

Professor Henkel's response here tracks his response to my inquiry of a few days ago. I quite agree, but I do think there is a way out: resampling statistics. No doubt Professor Henkel will correct me if I misstep, but, if I'm not mistaken, resampling estimates don't depend in any way on the character of samples themselves; they are drawn entirely from the characteriztics of the data being analyzed and have no further inferential extension. Iow, when we see a bootstrapped estimate of standard error, it is telling us that, given the dataset we are working with, the coefficient we are working with has less than, say, 5 chances in 100 of occuring by chance. There is no inference here to anything but a series of "resamples" of the original data; i.e. the estimates are non-parametric and assume nothing about the data collection methods or the structure of the data themselves.

Such estimates would be limited in application - no doubt, one reason why so few of us use them (I avoid them myself) - but they avoid many of the controversial aspects of significance testing. Or so people tell me: finding out more about Professor Henkel's views would be a instructive.

Ramon Henkel

I have been puzzling over Professor Mohr's statement that there is a legitimate interpretation of a significance test on a non-probability sample or non-randomized experiment. Finally, I have concluded that what he is referring to is what might be called a random process model of a phenomena of interest. In other words, the substantive hypothesis (research hypothesis) is that in the population the variables are randomly paired. Thus a "significant" result would imply that the variables were not randomly paired, or put another way, the variables are related in some nonrandom fashion. If my interpretation of Professor Mohr's position is correct, his thinking parallels that of others. David Gold in an article in The American Sociologist in February of 1969 titled "Statistical Tests and Substantive Significance" presents this perspective. Though I cannot recover a source, it is my recollection that H. Blalock has made a similar suggestion and I think it likely that there are others who have done the same. I am of the opinion, however, that such a perspective has a minefield of caveats and cautions to cross (as, seemingly, does Professor Mohr since he suggests using this approach intelligently and explicitly) if one is to use it "properly". One of my concerns is that it is unlikely that, say in sociological research, variables of interest are not related, and one ends up detecting the obvious by rejecting a random process model, and of course rejecting such a model will often be a function of the size of the non-probability sample or experimental groups. There is in addition the question of the "representativeness" of the sample or pool of experimental subjects of any possible population of interest and thus whether or not there is something to be followed up in future research. But, I'm also confident that cautions such as those I've just expressed are part of what Professor Mohr refers to as "intelligent" use of the approach.

The comments to this entry are closed.


November 2022

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Site Meter

Creative Commons License

  • Creative Commons License
Blog powered by Typepad