## 25 September 2006

### Comments So, when the MBE cut score in Cal. is 145 (which is roughly correct, setting aside the combined written/MBE scaling), that affects the left tail of the higher ranked schools (i.e., not many students) and the broad middle of the distribution for lower ranked schools (i.e., lots of students). very excelent naration and explanation, very wel done. The way I understand it, even at the individual level, we're still working with a binary (pass/fail) dependent variable, so the best we can do is to use logistic regression to model the probability of passing based on the observed results. If we were talking about predicting Bar *scores,* then both individual scores and grouped mean scores should be linearly related to most explanatory variables.

Unfortunately, we don't have access to law school mean scores, much less individual scores. In the Texas 2004 study by Klein & Bolus, it took a mandate from the legislature, to get individual scores. It took a promise of strict confidentiality to get information out of the law schools.

Unless you're doing an internal study, you can't get at the data. If you're interested in law-school level information, you have to choose between (i) eternally slicing and dicing the BPS and (ii) using the only publicly available data--law-school admission statistics and Bar passage rates.

I suppose that we could all talk to our administrations, and see if we can establish a joint database that wouldn't violate students' (federally-protected) rights to privacy or let individual schools be identified. Care to take bets on that happening? This thread is invoking some pretty high-end modeling concepts.

It seems to me, however, the curved nature of the line in Gary's graph is relatively intuitive.

Specifically, the unit of analysis is law schools (passage rates and mean LSAT scores). At virtually law schools, the LSAT scores are normally distributed with the mode (the most frequently occuring score) roughly approximating the mean and median. The difference, of course, is that mode/mean/median at Stanford is going to be quite a bit different--i.e., higher--than a Tier 4 school. (To visualize, imagine two bell curves in which the right tail of one barely overlaps the left tail of the other.) Further, distributions of law school MBE scores largely mirror the distribution of law school LSAT scores--i.e., normal, with central tendencies fixed on different scores.

So, when the MBE cut score in Cal. is 145 (which is roughly correct, setting aside the combined written/MBE scaling), that affects the left tail of the higher ranked schools (i.e., not many students) and the broad middle of the distribution for lower ranked schools (i.e., lots of students).

If the unit of analysis were students/bar examinees, then I suspect the relationship between LSAT and bar passage would, in fact, be linear and not curved at all.

If Chris, Eric, and Gary, tell me I am all wet, then I'll drop the point. But this would explain the curvalinear pattern in Brophy's scatterplot. bh. Thanks for the comments.

On Christopher Zorn's comment:

One of the benefits of the logit link function/transformation is that it is not bounded.

I tried an interaction variable between Cut Score and LSAT, and its effect was not statistically significant (p=0.812). One interesting result, though. When I did partial correlations taking into account LSAT, the partial correlation coefficients of Cut Score and Cut Score * LSAT with Logit(Bar)were almost identical: you had to go out 3 (states not scaling) or 4 (states scaling) decimal points for Cut Score to have a higher coefficient than that of the interaction variable.

On Eric Rasmusen's comment:

A logit link function is consistent with a constant percent change in the odds of passing (or, in this case, the ratio of passers to failers).

General:

Why a logit function (or transformation), rather than, say a quadratic, or other polynomial, function? True, you can "fit" some polynomial to almost any data. By trying increasingly higher-order polynomials, you can even have your function "veer" to pick up outliers. You'll probably end up with an model that both is hard to interpret, and bears little relation to the underlying dynamics of the relationship being modelled.

As to the underlying relationships, I note that logistic regressions are typically used to model *individual* pass/fail outcomes. In a logistic regression, the resulting model uses a logit transformation of the probability of passing.

The passage rate of a *group* is a function of how much of the distribution of the group's scores falls below the cut score. The chief determinants of a group's passage rate are (i) the promixity of the cut score to the mean (location) of its score distribution; and (ii) the shape of that distribution. Changes in that proximity should result in passage rates that follow the cumulative distribution function (CDF) of the underlying distribution. Assuming a unimodal distribution, the related CDF will be S-shaped. Both probit and logit models are S-shaped, but the logit model is easier to interpret. In the post's second diagram, there are two ways to look at the effect of making bar exams tougher. One is that it would raise the flunk rates of the lower-rank schools the most (say, from 50 to 60%). Another is that it would multiply the percentage flunked the most in the good schools (say, from 5% to 7%). There exists some shape of curve such that every schools' flunk rate increases by the same multiplying percent. I'm afraid Gary's study may be a bit misleading with respect to the curvilinearity he discusses. In fact, the observed (or, in the chart, predicted) differential effects of cut scores at varying LSAT levels is *entirely* due to the functional form of the model; the logistic link function necessarily "bends" at probabilities near zero and one (that is, zero and 100 percent passage rates), which means that the marginal impact of a change in some covariate X on passage rates is *necessarily* smaller as the passage rates get closer to 100 (or zero).

In order to really test whether the cut scores differentially effect law students at varying SAT levels, an interaction term is required:

Passage Rate = f[B0 + B1(Cut Score) + B2(LSAT) + B3(Cut Score * LSAT) + ... ]

Such a model explicitly allows the impact of cut rates to vary across schools with different LSATs. Contrary to popular belief (e.g., Epstein et al. 2006 Journal of Politics), such an interaction term is necessary to test for this kind of relationship, even in models where the functional form is itself nonlinear. Now, I look at that chart and I think, wow, if I were a disfavored minority, I would really want to go to Stanford, where there is a 95 to 98% chance that they will teach me enough to pass the bar in any case. And not Whittier, where a shift in state bar association policy might cut a third of the class.

And then I'd think, this nonlinearity has significant implications for Professor Sander's weighing of grades vs. eliteness.

The comments to this entry are closed.

## August 2019

Sun Mon Tue Wed Thu Fri Sat
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

## Creative Commons License

Blog powered by Typepad