Thanks so much to Jeffrey for his extremely thoughtful response to my study. There is a lot there, so I'll just respond in chunks, and start by ceding his conclusion: "the finding of no correlation does not eliminate the possibility of a complex causal connection." That is likely true of any correlation study, and is certainly true here. Any correlation study is only as good as the data, and even with perfect data a correlation study can only report a raw finding of either positive, negative, or no correlation. As I note in the paper it is dangerous to extrapolate too much from a correlation study, and I certainly agree that I can't rule out some hidden complex causal connection.
Nevertheless, I do feel obligated to mount a defense for the much maligned student evaluation. I have three basic lines of defense. The first is practical. For better or worse, teacher evaluations are the only viable way to measure teaching effectiveness for a study of this breadth. My other choices were exceedingly unpalatable: 1) attempt to gather peer evaluation data, which is rarely if ever expressed numerically, and would also almost certainly not be provided by the host institutions; or 2) use some type of personal subjective measure of teaching effectiveness, potentially requiring me to personally visit classes and make my own determination on teaching effectiveness. In sum, these data are not only the best data, they are the only usable data.
The second is empirical. While there are various studies showing biases in student teaching evaluations (although I am not aware of a study that shows the biases Jeffrey mentions), there are other studies showing that student evaluations are reliable and valid, and that student evaluations correlate with other measures of teaching effectiveness, including peer evaluations and tests of student retention. My favorite overview is Herbert Marsh's 1987 publication "Students’ evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Further Research." Theall and Franklin have a more recent book called "The Student Ratings Debate" that provides a good overview of the controversy, and a chapter entitled “Looking for Bias in all the Wrong Places" that updates the state of the empirical studies and provides further support for student evaluation validity.
My last line of defense is, sadly, anecdotal. It was not that long ago that I sat in a law school classroom and evaluated my law school professors. At that point I had experienced approximately 17-19 years of schooling, and I think I was a pretty decent judge of good teaching. I had certainly seen my share of both over the years. I was also poised to take on the responsibility of becoming a lawyer and representing actual clients in court. Nevertheless, many members of the legal academy assume that I was unqualified to determine whether the teaching I received was effective, and even guess that I filled out my forms based on a popularity contest, or the ease of the course. This strikes me as somewhat paternalistic, and an underestimation of the abilities, sophistication, and discernment of our students. I will also note, for the record, that I am a solidly middle of the pack teacher at Tennessee based on my evaluations and I accept that as a fair evaluation.