I would like to thank the readers of ELS for the excellent comments thus far to my previous posts. In this post, I want to raise an issue that I have been thinking about for quite some time. Although not every project or issue can be appropriately analyzed using descriptive or basic inferential statistics, it seems to me that many empirical scholars (both within the legal academy and outside of it) underestimate the value of using basic statistics to explain phenomena or relationships between variables. Instead, empirical scholars often look for the most complex models, which result in papers that are impenetrable for non-quantitative scholars, rather than searching for ways to present the data in a fashion that can be understood by a wider audience.
As someone who sits on dissertation committees in other disciplines on occasion and tries to attend workshops in political science as much as possible, I notice this trend among graduate students as well: in a zest to conform to the norms of their discipline, graduate students will sometimes forget that there can be enormous (and sometimes more!) value to presenting the data using basic statistical techniques. If we, as empirical scholars, want to reach a wider audience for our work, is it worthwhile to simplify the empirical papers that we write when possible? Is reaching a wider audience even a worthwhile goal? Would courts and policymakers pay more attention to our scholarship if they could understand it? Again, I realize that not every project lends itself to simple statistical techniques, but I thought these questions worth asking nonetheless. As always, your comments are most welcome.
I read Lee and her associates's papers. Very good indeed, in most respects. We part company on two issues. First, I am very fond of "super tables" where readings for a response variable are laid out by categorical carriers. I can't help it; I like the exact figures and I don't think a graphic representation of them is much, if any, improvement over the tabular form. Second, while I agree that summary model descriptions in terms of the effects of sets of variables are handy and easy to understand, I still want the summary statistics themselves as well. Sure, that leads to a little table clutter. It also means, however, that I don't have to contact the author for the actual figures later and that I have a ready index for comparisons across different studies. Until the Revolution, we'll need that.
Posted by: Tracy Lightcap | 17 April 2007 at 10:16 PM
Two papers relevant to this discussion:
http://epstein.law.northwestern.edu/research/communicating.html
We would all do well to follow the authors' prescriptions.
Posted by: Sara Benesh | 17 April 2007 at 09:28 AM
I resonate with this. All statistics should be presented with Tukey's Inter-Occular Trauma Test (i.e. the results hit you between the eyes) in mind. It is the only test that matters and is usually best conveyed by simple graphic techniques and tables. The best practice example is, of course, the Coleman Report. The conclusions are based on a set of relatively sophisticated two-stage regression models, but the actual presentation in the report is in the form of bar graphs (simple TWO dimension bar graphs) and 2X2, 2X3 percentage tables. It's a marvel of how to do sophisticated work so that almost anyone can understand it.
The problem: much of social science research isn't based on data sets like those Coleman used. In fact, the sophistication of stats techniques used today seems tied, in many instances, to the lack of adequate samples and to research designs that preclude easily interpreted results. The technical means have come to substitute for data deficiencies. This leads to a subsidiary difficulty: scholars, especially in the early stages of their careers, begin to identify new, unusual techniques as the way in to the better journals and elevate those beyond the data as the main selling point for their work. This is very unfortunate and has had bad effects across disciplines. (I once got a review on a paper that chided me for being "too terse". Another blowback from this regrettable situation; my reviewer actually wanted me to use more complicated language!)
I'm not sure there's a way out of this. The best thing to do, I think, is to use the techniques that do the best job describing your data and let the chips fall where they may. Trying to sell papers with techniques is usually a losing proposition anyway. I just think of the Coleman Report, hold my breath, and write the clearest descriptions of what I'm about I can. Following Jeff's advice is a good way to start.
Posted by: Tracy Lightcap | 16 April 2007 at 04:26 PM
All good points, to which I would add: descriptive statistics are potentially more valuable than advanced models, including regression. As noted on the blog, regression models are overly consumed with statistical significance and often fail to even demonstrate the substantive significance of those findings. Descriptives typically permit the reader to get at least a rough estimate of substantive effects.
However, I still see benefit to examining sophisticated models, once the necessary groundwork has been performed. Some apparent descriptive relationships simply fall apart, once selection effects or control variables are considered.
Posted by: frank cross | 13 April 2007 at 11:14 AM
I agree with Jeff too. I have a theory about why clarity is so difficult for ELS types (although admittedly it is not much of a theory because it is based only on myself and conversations with other graduate students). I find that people who can communicate their thoughts both clearly and simply are those who (a) are good writers and (b) have a firm understanding of what they are writing about. I think most social scientists as well as law types meet the first criteria. It is the second element that is missing. Without a clear understanding of what quantitative analysis means--and what it does not mean--the writer (me) strives to ensure that the analysis is performed correctly and that the results are interpreted correctly. The easiest way to do this? Make the paper read like a statistics textbook. There can be no analysis where there is no depth of understanding. Or put another way, you can't provide clarity to others when you aren't clear yourself.
Ok, now that I'm on the soapbox...and I am going into deep (and unfamiliar) water--so please correct me if I am wrong about any of this. A couple more reasons for the stilted (and less than clear) writing are page limitations for journal submissions. You can't provide flowery descriptive prose when the page limits you to Hypothesis and conclusions. Another reason (found in Becker's great little book Writing for Social Scientists--arguing against this practice) is that journal editors expect to see stilted writing (or discount simpler explanations).
Posted by: D Campbell | 13 April 2007 at 08:01 AM
I think Jeff is right, and I didn't mean to say that multivariate regression and other modeling cannot be explained in an intuitive and understandable way. I just think that there is often a rush (and even an expectation) to use complex modeling when simpler statistics, in conjunction with or instead of, such models can be equally or more enlightening. At the least, as Jeff's comment suggests, we need to do a better job of explaining our empirical work to nonquantitative scholars.
Posted by: David Stras | 12 April 2007 at 01:43 PM
I think that David and the comment writers would agree that multiple regression and similar estimation methods are not necessarily at odds with accessibility or clarity. I think that the real problem lies not in what technique we use, but rather how we do it and explain it.
I agree with D. Campbell (and David) that there often seems to be some unnecessary complexity and lack of transparency in many articles. I dont think that this is usually because of the techniques used, but rather is due to poorly articulated research designs. In my humble opinion, multiple regression is fairly intuitive if one sets up the research design in a transparent and understandable way.
Some basic points are: 1) tell us what your dependent variable is and how it is measured; same goes for your primary independent variable(s) and controls. An appendix outlining these matters and providing descriptive stats goes a long way toward accomplishing this. 2) State your hypotheses clearly. It doesn't have to be in a formal, stilted manner, but the reader shouldn't have to search for them or infer what your theory expects. 3) Use the simplest appropriate method possible to estimate your model. 4) Provide some intuitive substantive interpretation of your results.
Ok, I'm done - sorry to digress. However, probably 75% of the manuscripts that I review fail these steps. I certainly dont mean to imply that I am perfect in this regard, but I do always attempt to be accessible and understandable in my writings.
Certainly, descriptive stats can be very helpful in explaining phenomena, however certain research questions do lend themselves to more advanced methods. The devil is in the details.
Posted by: Jeff Yates | 12 April 2007 at 12:45 PM
Thanks for bringing this up. Another question is: why do graduate students feel the need to use the most complicated/advanced methodological techniques as possible? I believe that the role of those directing the student (dissertation committee chair) have a lot to do with it. The bottom line is that "conform[ing] to the norms of the discipline" is precisely what poli sci graduate students must do to get a job teaching poli sci. I do not agree with it and frankly think it makes political science practically irrelevant, but it is what it is.
To follow up on your second question--should empirical work be accessible? Before I entered graduate school, I never thought THIS would be a central question in the discipline--but it is. My take is that the answer is definitely yes. If you pull recent copy of a political science journal, if you do not have an empirical background you will be lost from page 3 (the introduction and literature review may be understandable). Show these articles to policy makers [of course this assumes that the research is on an issue of interest to policy makers which is a another can of worms] and I doubt you would find many that could understand.
Posted by: D Campbell | 12 April 2007 at 06:45 AM
David,
I really agree with this post. I might good a bit further. Basic descriptive statistics can often contradict or qualify assumptions in our theoretical and normative discourse. Moreover, in this digital age, there is a lot of data available from relatively reliable public or published sources. The days of the armchair empiricist should be coming to a close. bh.
Posted by: William Henderson | 11 April 2007 at 09:36 PM