The social science controversy over significance testing arises from two, often overlapping, sources so there is arbitrariness in saying a criticism of significance testing stems from either source. One source concerns the technical requirements of significance tests, the second source is essentially a philosophy of science question, whether significance tests provide information of value in developing and validating scientific theories (call this process scientific inference). There are those who feel that in social science research, one rarely meets the technical requirements for appropriate use of significance tests and thus such tests should not be used and of course there are those who hold that significance tests can legitimately be used. There are those who feel that significance testing in social science, even if technical requirements are met, contributes nothing of value to the process of scientific inference and of course there are those that hold that significance testing is a necessary and important aspect of scientific inference.
From the perspective of technical requirements for the results of a significance test to be meaningful, one of the most basic requirements is that the sample be an appropriate probability sample. As noted in earlier material, this essentially means that the sample has to be a simple random sample (SRS). The reason for this is that the significance level tables for tests of significance and the calculation formulas for the various test statistics (such as Chi square, Normal, F, t) are created assuming SRS. As a result, if one employs a probability sampling procedure, but not SRS the probability calculated will be wrong, but, generally, one does not know by how much or in which directionBtoo large or too small. If one has not used a probability sampling procedure, there is no basis for the probability calculation outside of assuming random factors other than those introduced by probability sampling. Such assumptions, though one may make them, cannot be demonstrated to be valid or invalid and employing them as a basis for employing statistical inference techniques leads to confusion as classical statistical inference considers random factors in only the sampling process.
Regarding the issue of sampling, it is difficult to obtain a SRS for human populations studied by social scientists since one rarely, if ever, has a list of the population. Even if a list were available, such as a list of eligible voters, such lists are rarely current and complete.
Procedurally there are often misapplications of, and misinterpretation of significance tests that, for those critical of the use of significance tests, are the basis for their rejection of the use of significance testing.
Among the misuse/misapplication of inferential techniques are such as independent sample techniques being used on correlated samples, using the wrong sampling distributions, techniques are applied to data at an inappropriate level of measurement and one-tailed tests are often used when non-directional tests are appropriate.
There are usually population distribution assumptions underlying the use of a particular significance test, such as a normally distributed population. These assumptions are infrequently met in social science research with usually unknown effects on the sampling distribution of the statistic and thus unknown effects on the probability calculation.
Statistically one may make inferences to only the population from which the proper probability sample has been drawn. Attempts to infer beyond the sampled population or from non‑probability samples are obviously incorrect, but occur.
There are often errors in the interpretation of the significance level. A common error is to interpret significance in terms of importance (significance at the .01 level is evidence of a more important result than significance at the .05 level) or as an indication of the strength of a relationship or the probability of the replicability of results.
The choice of level of significance is not a result of mathematical theory or substantive theory, the level chosen is totally arbitrary and at best reflects a researcher=s sense that the improbable will not happen to him/her.
The power of the test is essentially never known in social science because social science theory is not sufficiently developed to provide the requisite parameters for the null and alternative hypotheses, thus one does not have the requisite sampling distributions to calculate power. Without an estimation of the power of the test one might as well use a box containing red and green beads in the proportion reflecting the level of significance chosen, for example 5 red and 95 green for a test at the .05 level, with the null hypothesis rejected on the random selection of a red bead in a single draw from the box (Robert Chandler, The Statistical Concepts of Confidence and Significance, Psychological Bulletin, 54, 5, 1957, 429-430).
Finally, it is not uncommon for researchers to run all possible combinations of variables and select those which show significance. The practice is usually called data dredging and changes the actual level of significance used to a less stringent level.
From the perspective of philosophy of science concerns there are a number of issues. In any science, developing and validating theories (scientific inference) is generally considered a primary goal and the concern here is for how significance tests contribute or fail to contribute to this goal. Loosely, scientific inference is the process of modifying our degree of belief in the validity of scientific theories. It is important to note that this is a cognitive process of varying our degree of belief in theories and is an incremental process, not an either‑or process.
From a positivist perspective [Karl Popper, The Logic of Scientific Discovery] we validate a theory (or invalidate it) by using the theory to make predictions about the real world, then obtain data from the real world to compare with the predictions. If the real world data is close to what theory predicts, we tend to view the data as supporting, or consistent with the theory. In other words, we have at least not invalidated the theory, though we may be some way from considering the theory to be valid. If it is not close to what the theory predicts, the lack of support is consistent with the theory being invalid. In other words, science is concerned with the change in (or resulting) degree of belief in the theory as a result of looking at the result of an empirical analysis of data bearing on the theory.
The slipperiest aspect of this oversimplified characterization of how theory is validated is the question of how close is close enough? How many units of whatever is being measured (fertility, crime rate, prejudice ...) may we be from what theory predicts before we begin to doubt the validity of the theory? How big a difference is a difference? Unfortunately, there is no set of universal criteria which can determine what constitutes "close enough" in terms of units of whatever is being measured. If we were to stick to comparisons of closeness in terms of units of measurement of our variables, judgments of "close enough" would probably always have a very large "subjective judgment" component.
It is in this context, the context of how we judge "close enough" that statistical inference (in terms of significance testing) offers one possible criterion. Closeness is evaluated in terms of the probability of differences of a given or greater magnitude between the real world data and the predictions of the theory, not in terms of absolute difference in whatever units of measure we are using.
Returning to the idea of scientific inference, we may characterize scientific inference as involving the question of "What is the probability of the theory, given the data?@ (P[T|D]), since we are considering the validity of the theory in terms of the correspondence between theory and empirical data. However, what significance testing gives is the probability of the data given the theory (P[D|T]). In other words, how likely are the results assuming the theory tested is true. Thus significance tests provide information not directly relevant to the process of scientific inference, and would thus seem to contribute little toward scientific inference.
To compound the problem of the disjoint between statistical inference and scientific inference there are several additional problems:
Theories apply to hypothetical populations, in other words, to populations that are neither time nor space bound. Statistical inference applies only to currently existing populations as these are the only ones available for sampling. Thus inferential techniques do not inform us about populations of theoretical importance without making assumptions about the extent to which existent populations can be considered probability samples of hypothetical populations.
The significance test model does not intrinsically allow for the accumulation of knowledge--no Chi Square test, t‑test, or other standard test includes information on prior tests of the same hypothesis, therefore the significance testing model does not intrinsically involve the actual scientific process of cumulating knowledge. Were Bayesian approaches sufficiently accepted, the application of Bayes theorem would allow the calculation of the posterior probability given the prior probability of the theory.
The null hypothesis tested in social science is scientifically uninformative as it is known beforehand to be essentially false, and with large enough samples one can usually reject any null hypothesis.
The purpose of a significance test is to reach a decision to reject or fail to reject the tested hypothesis, while scientific inference involves a cognitive change, a change in degree of belief in a theory.
Finally, placing one=s faith in a mechanical process (significance testing), particularly when the fate of a theory hangs on the second, third or subsequent decimal place in a probability calculation instead of careful thought is absurd.
Many of the technical issues raised by those who consider significance testing to not contribute to the goals of science might be considered nit picking, but the philosophical issues are less easily dismissed. Put another way, proper training might eliminate most of the technical issues (except for problems in obtaining proper probability samples), but philosophical issues are valid concerns even if no technical problems exist.
Prof. P.K.Pattnaik:
The general rule is that the nature of the problem to be researched determines the range of methodologies which could be used. It is not unusual for methodologies used in one discipline to be used in other disciplines as well. I am not familiar with legal research and thus am ignorant of the types of problems researched and thus the appropriate methodologies. However, to determine if any social science research methodology is appropriate for a particular legal research problem, the researcher, such as yourself, need only consult an introductory level social science research methods text (there are many, many such) and see whether any of the methodologies presented are appropriate for the problem at hand.
Posted by: Ramon Henkel | 18 March 2008 at 07:43 PM
Dear Sir,
I like your issues. Is it safe to use social science research method in legal research? Please send some literature on empirical research method for legal studies
Prof. P.K.Pattnaik
Posted by: Prof. P.K.Pattnaik | 18 March 2008 at 02:25 AM
Professor Mohr's long comment makes the claim that in his book he believes he has formally demonstrated that it is impossible to develop social science theories, and this makes concerns about the criticisms of statistical inference from a philosophy of science perspective irrelevant. Since I have not read his book I cannot comment on the validity of his claim. As a substitute for my own reading, I did a Google search for critical reviews of the book, hopefully by philosophers of social science or others more versed in the issues of the possibility of social science theory than I. I found none so I have no sense of anyone's evaluation of the validity of his claim.
I do, however, agree with his position that social scientists (at least not sociologists) have not produced viable theory of the nature that exist in the physical sciences in spite of the attempts of many in the area of mathematical sociology and theory formalization during, roughly, the period from the 1960's through the the 1980's, not to forget the work by Stuart Dodd (Dimensions of Society) in the late 1930's--early 1940's.
In such a short commentary, Professor Mohr could not develop his perspective that the task of social science is to "establish what actually happened, and why, in the past in order that the knowledge gained might be helpful in the future" which seems to reduce social scientists to a subcategory of historians, thought perhaps more statistically oriented than most historians. I do not know if this is his intent as such brief statements are open to misinterpretation and misunderstanding. However, it seems to me that if all one is doing is describing a historical situation and teasing out what causal relations existed at that time, I see little value in applying statistical inference at all (this may also be Professor Mohr's position). A variety of descriptive techniques would certainly be useful in many of investigations, but, if I understand his position, there is no population to which a statistical inference might be made. Other interpretations of the probability model underlying significance testing might be made, but as noted previously, these interpretations have their own problems.
Posted by: Ramon Henkel | 01 February 2007 at 11:08 AM
Longish comment: Henkel provides two types of reasons for why significance testing might be of little or no value in social science -- technical reasons and philosophical reasons connected to scientific inference. In the end, he concedes that the technical reasons are not fatal because they reflect misuses that can be overcome with proper training. I agree with this position and feel that I have observed a great many applications of testing that have not committed any of the technical errors he catalogs. He argues, however, that the scientific objections are serious and remain valid concerns relating to the utility of significance testing in the social disciplines.
I would urge that the philosophical objections are even less germane than the technical ones. The reason is that all of the philosophical objections depend on the premise that social scientists are engaged in developing theories and that testing is used, or misused, as a critical part of this process. In reality, although it is only too true that many social scientists are trying to develop valid theories of human behavior, they are not succeeding in doing that and never will succeed because it is impossible to develop such theories (unless they be strictly biological). Therefore, it is irrelevant whether or not statistical testing contributes to theory development. The question is whether it can help in the pursuit of other, more valid purposes of social research.
By "theories" I mean valid, stable, causal explanations of particular human behaviors, such as turning out to vote, adopting an innovation, being less conservative as a group than as a collection of individuals, and so forth. Typically, when we run regression models, for example, this is what we are trying to do. In my experience, leading social scientists will agree with the negative position taken above, no matter what they are doing with their research time. Moreover, if you propose to an individual, intelligent human being that you have a theory that will accurately predict whether he or she will adopt a particular innovation, even after knowing the theory, the person can laugh at you -- justifiably -- and proceed to defy you by violating the theory. Most important, however, I feel that I have proved this negative position formally in a recent book (The Causes of Human Behavior) under light assumptions to which most of us will subscribe. In the process, I also sought to demonstrate that the development of so-called probabilistic theories is just as futile an endeavor as deterministic ones.
As I see it, the purpose of social research is to establish what actually happened, and why, in the past in order that the knowledge gained might be helpful in the future. The emphasis in steering research plans, it seems to me, should be on this "helpful". It can be (but will not necessarily be) helpful, for example, to know what the electorate of a certain jurisdiction has been thinking and why. It might also be (but will not necessarily be) helpful to establish the causal explanation for the adoption or non-adoption of an innovation by a group of farmers in their real world or a group of freshman psychology students in the laboratory. It's up to us to design our projects with this kind of function in mind, and not in the attempt to develop theories, whether probabilistic or deterministic, such as are common in physics.
In such endeavors -- in trying to establish what happened in a particular past and why -- significance testing can definitely be of assistance, under the conditions that I offerred in my post, above.
Posted by: Lawrence Mohr | 31 January 2007 at 02:13 PM
A recent article by Andrew Gelman and Hal Stern in the American Statistician points out a facinating irony: The difference between "significant" and "not significant" is itself not statistically significant.
http://www.stat.columbia.edu/%7Egelman/research/published/signif4.pdf
Posted by: Damon Cann | 31 January 2007 at 09:47 AM