Last month, I blogged about the importance of visually inspecting data to ensure that ordinary least squares (OLS) is an appropriate method for estimating a relationship between two variables. As I noted, one pattern that you will occasionally observe is a monotone curvalinear relationship--i.e., an increasing or decreasing function in which the best-fitting line has an observable bend. Here is the example I posted:
Over at Moneylaw, Al Brophy (Alabama Law) posted some interesting data on the strong correlation between the various U.S. News input variables and bar passage rates of the California law schools. In a nutshell, almost every U.S. News input variable correlates with bar passage between .84 and .90. To illustrate his point, Al posted this scatterplot:
A visual inspection of Al's chart (here, bar passage and the calculated LSAT mid-point) suggests a curvalinear relationship. (Note that a correlation coefficient reflects the best-fitting straight line.) Since I don't have access to Al's data, I superimposed a rough fit line, which approximates the first graph.
Fortunately, Gary Rosin (South Texas Law) recently posted a paper, "Unpacking the Bar: Academic Qualifications, Cut Scores, MBE Scaling and Law School Bar Passage Rates," provides a very nice chart that shows the interaction between MBE cut scores and a law school's mean LSAT. It provides some insight on why the best fitting line in Al's scatterplot is probably curved. [See chart after the jump.]
In the above chart, the line for each hypothetical cut score is curved because a larger proportion of graduates from schools with lower mean LSATs score at or near the cut score. Hence, a 5 point increase in California's cut score might pr0duce a 1% decline in bar passage at an elite school and a 10% decline at schools with more modest entering credentials. In addition, the higher the cut score (and California's is quite higher), the steeper the downward slope after the curve.
It is noteworthy that deans from regional law schools essentially intuit this relationship without the benefit of fancy charts. This group is always among the most vocerifious critics of higher cut scores; the impact on lower ranked schools--already strapped for resources--can be enormous.
By the way, Gary's paper also makes a terrific argument that varying MBE cut scores and scaling practices from state to state are an example of federalism run amok; there is no coherent thread of lawyer competency that ties these standards together.
So, when the MBE cut score in Cal. is 145 (which is roughly correct, setting aside the combined written/MBE scaling), that affects the left tail of the higher ranked schools (i.e., not many students) and the broad middle of the distribution for lower ranked schools (i.e., lots of students).
Posted by: buy viagra | 29 March 2010 at 11:52 AM
very excelent naration and explanation, very wel done.
Posted by: mike | 25 May 2007 at 02:14 AM
The way I understand it, even at the individual level, we're still working with a binary (pass/fail) dependent variable, so the best we can do is to use logistic regression to model the probability of passing based on the observed results. If we were talking about predicting Bar *scores,* then both individual scores and grouped mean scores should be linearly related to most explanatory variables.
Unfortunately, we don't have access to law school mean scores, much less individual scores. In the Texas 2004 study by Klein & Bolus, it took a mandate from the legislature, to get individual scores. It took a promise of strict confidentiality to get information out of the law schools.
Unless you're doing an internal study, you can't get at the data. If you're interested in law-school level information, you have to choose between (i) eternally slicing and dicing the BPS and (ii) using the only publicly available data--law-school admission statistics and Bar passage rates.
I suppose that we could all talk to our administrations, and see if we can establish a joint database that wouldn't violate students' (federally-protected) rights to privacy or let individual schools be identified. Care to take bets on that happening?
Posted by: Gary Rosin | 26 September 2006 at 04:27 PM
This thread is invoking some pretty high-end modeling concepts.
It seems to me, however, the curved nature of the line in Gary's graph is relatively intuitive.
Specifically, the unit of analysis is law schools (passage rates and mean LSAT scores). At virtually law schools, the LSAT scores are normally distributed with the mode (the most frequently occuring score) roughly approximating the mean and median. The difference, of course, is that mode/mean/median at Stanford is going to be quite a bit different--i.e., higher--than a Tier 4 school. (To visualize, imagine two bell curves in which the right tail of one barely overlaps the left tail of the other.) Further, distributions of law school MBE scores largely mirror the distribution of law school LSAT scores--i.e., normal, with central tendencies fixed on different scores.
So, when the MBE cut score in Cal. is 145 (which is roughly correct, setting aside the combined written/MBE scaling), that affects the left tail of the higher ranked schools (i.e., not many students) and the broad middle of the distribution for lower ranked schools (i.e., lots of students).
If the unit of analysis were students/bar examinees, then I suspect the relationship between LSAT and bar passage would, in fact, be linear and not curved at all.
If Chris, Eric, and Gary, tell me I am all wet, then I'll drop the point. But this would explain the curvalinear pattern in Brophy's scatterplot. bh.
Posted by: William Henderson | 26 September 2006 at 12:48 PM
Thanks for the comments.
On Christopher Zorn's comment:
One of the benefits of the logit link function/transformation is that it is not bounded.
I tried an interaction variable between Cut Score and LSAT, and its effect was not statistically significant (p=0.812). One interesting result, though. When I did partial correlations taking into account LSAT, the partial correlation coefficients of Cut Score and Cut Score * LSAT with Logit(Bar)were almost identical: you had to go out 3 (states not scaling) or 4 (states scaling) decimal points for Cut Score to have a higher coefficient than that of the interaction variable.
On Eric Rasmusen's comment:
A logit link function is consistent with a constant percent change in the odds of passing (or, in this case, the ratio of passers to failers).
General:
Why a logit function (or transformation), rather than, say a quadratic, or other polynomial, function? True, you can "fit" some polynomial to almost any data. By trying increasingly higher-order polynomials, you can even have your function "veer" to pick up outliers. You'll probably end up with an model that both is hard to interpret, and bears little relation to the underlying dynamics of the relationship being modelled.
As to the underlying relationships, I note that logistic regressions are typically used to model *individual* pass/fail outcomes. In a logistic regression, the resulting model uses a logit transformation of the probability of passing.
The passage rate of a *group* is a function of how much of the distribution of the group's scores falls below the cut score. The chief determinants of a group's passage rate are (i) the promixity of the cut score to the mean (location) of its score distribution; and (ii) the shape of that distribution. Changes in that proximity should result in passage rates that follow the cumulative distribution function (CDF) of the underlying distribution. Assuming a unimodal distribution, the related CDF will be S-shaped. Both probit and logit models are S-shaped, but the logit model is easier to interpret.
Posted by: Gary Rosin | 26 September 2006 at 12:17 PM
In the post's second diagram, there are two ways to look at the effect of making bar exams tougher. One is that it would raise the flunk rates of the lower-rank schools the most (say, from 50 to 60%). Another is that it would multiply the percentage flunked the most in the good schools (say, from 5% to 7%). There exists some shape of curve such that every schools' flunk rate increases by the same multiplying percent.
Posted by: Eric Rasmusen | 26 September 2006 at 09:03 AM
I'm afraid Gary's study may be a bit misleading with respect to the curvilinearity he discusses. In fact, the observed (or, in the chart, predicted) differential effects of cut scores at varying LSAT levels is *entirely* due to the functional form of the model; the logistic link function necessarily "bends" at probabilities near zero and one (that is, zero and 100 percent passage rates), which means that the marginal impact of a change in some covariate X on passage rates is *necessarily* smaller as the passage rates get closer to 100 (or zero).
In order to really test whether the cut scores differentially effect law students at varying SAT levels, an interaction term is required:
Passage Rate = f[B0 + B1(Cut Score) + B2(LSAT) + B3(Cut Score * LSAT) + ... ]
Such a model explicitly allows the impact of cut rates to vary across schools with different LSATs. Contrary to popular belief (e.g., Epstein et al. 2006 Journal of Politics), such an interaction term is necessary to test for this kind of relationship, even in models where the functional form is itself nonlinear.
Posted by: Christopher Zorn | 26 September 2006 at 08:21 AM
Now, I look at that chart and I think, wow, if I were a disfavored minority, I would really want to go to Stanford, where there is a 95 to 98% chance that they will teach me enough to pass the bar in any case. And not Whittier, where a shift in state bar association policy might cut a third of the class.
And then I'd think, this nonlinearity has significant implications for Professor Sander's weighing of grades vs. eliteness.
Posted by: Corey | 26 September 2006 at 12:58 AM