Over the last few weeks, Andy Morriss and I have been working with U.S. News rankings, focusing primarily on post-graduation outcomes such as employment upon graduation and bar passage. Much of the raw data does not satisfy the assumptions of ordinary least squares regression (linear relationship between X and Y, constancy of variance, etc.). Thus, to create a proper linear model, I have been experimenting with various tranformation methods.
One of the lessons this experience drove home was the importance of examining your data BEFORE you rely upon the various OLS parameter estimates. In our case, the estimates changed in important ways after transforming the relevant variables. It also reduced the amount of heteroskedasticity (i.e., nonrandom distribution of errors) in our various specifications, which will save us some additional corrective steps.
Here are some graphics that I thought readers might find useful. This graph [Click to enlarge, excerpted from Fox, Applied Regression Analysis, Linear Models, and Related Methods 70 (1997)] shows a monotone curvalinear relationship, which can plausibly be corrected by a power transformation of the dependent (Y) or independent (X) variable:
Here is a nice graphic [Click to enlarge] that shows the Tukey & Mosteller "bulging rule", excerpted from Fox, supra, at 71. The "bulge" reflects the direction of the monotone curve.
Finally, here is a nice graphic [Click to enlarge, excerpted from Pallant, SPSS Survival Manual 79 (2001) , citing Tabachnick & Fidell, Using Multivariate Statistics (3rd ed. 1996)], that suggests various useful transformations depending upon the frequency distribution of the variable. (Note that these charts reflect histographs that deviate from a normally distributed curve, not curves that fit an X-Y scatterplot, as above):
Especially for those readers publishing in student-edited law reviews, you could easily make huge Type I and II errors if you fail to carefully check your data before estimating your parameters. More importantly, you could provide a completely erroneous version of the world that some non-statistics saavy readers will be too intimidated to question. For ELS to flourish as a movement, we need to avoid these basic mistakes.
Good god, Joe, that's not your problem. This is the stuff of elementary algebra. And why would anyone try to "confuse the uninitiated"? Don't we have better things to do? Seriously, this is really silly.
True, it may be best for us to work with our colleagues so that they understand what we're doing. But there's only so far one needs to take this. At some point-and we've reached it here-we have to say we did the best we can and move on.
Posted by: anon | 04 August 2006 at 02:17 PM
Excellent point, Bill, and great examples (and a nice hat-tip to the recently departed Mosteller). There is an interesting wrinkle to this story. Some people think that transforming data is a trick that empirical types do to confuse the uninitiated; if the data don't fit our theory we change the data! How do we convince them otherwise? I've had discussions with faculty who think it's cheating to use logarithms in OLS, and no amount of math, examples or cites to Greene will convince them otherwise.
Posted by: Joe Doherty | 03 August 2006 at 10:57 PM