Over the last few weeks, Andy Morriss and I have been working with U.S. News rankings, focusing primarily on post-graduation outcomes such as employment upon graduation and bar passage. Much of the raw data does not satisfy the assumptions of ordinary least squares regression (linear relationship between X and Y, constancy of variance, etc.). Thus, to create a proper linear model, I have been experimenting with various tranformation methods.
One of the lessons this experience drove home was the importance of examining your data BEFORE you rely upon the various OLS parameter estimates. In our case, the estimates changed in important ways after transforming the relevant variables. It also reduced the amount of heteroskedasticity (i.e., nonrandom distribution of errors) in our various specifications, which will save us some additional corrective steps.
Here are some graphics that I thought readers might find useful. This graph [Click to enlarge, excerpted from Fox, Applied Regression Analysis, Linear Models, and Related Methods 70 (1997)] shows a monotone curvalinear relationship, which can plausibly be corrected by a power transformation of the dependent (Y) or independent (X) variable:
Here is a nice graphic [Click to enlarge] that shows the Tukey & Mosteller "bulging rule", excerpted from Fox, supra, at 71. The "bulge" reflects the direction of the monotone curve.
Finally, here is a nice graphic [Click to enlarge, excerpted from Pallant, SPSS Survival Manual 79 (2001) , citing Tabachnick & Fidell, Using Multivariate Statistics (3rd ed. 1996)], that suggests various useful transformations depending upon the frequency distribution of the variable. (Note that these charts reflect histographs that deviate from a normally distributed curve, not curves that fit an X-Y scatterplot, as above):
Especially for those readers publishing in student-edited law reviews, you could easily make huge Type I and II errors if you fail to carefully check your data before estimating your parameters. More importantly, you could provide a completely erroneous version of the world that some non-statistics saavy readers will be too intimidated to question. For ELS to flourish as a movement, we need to avoid these basic mistakes.