Few people, scholars included, approach a topic like affirmative action in higher education, without prior views about the political morality and general desirability of the practice. Affirmative action is not unique in this respect. Targets of study, ranging from the death penalty, to the environment, to free market as opposed to regulatory approaches to corporate governance, may be topics which the scholar, as citizen, cares deeply about or is even ideologically or psychologically committed to. This is not necessarily unfortunate. Outstanding research often grows out of a researcher’s passions. Prior commitment to a position, especially passionate commitment, may nonetheless pose problems as it may affect how research is done and the reception accorded the results of research. For legal scholars, however, strong commitments often pose few difficulties. Because much legal research is normative, advocacy of a position, even passionate advocacy, is often appropriate to the genre. If a person has been blinded by his or her strong preferences, weaknesses in an argument or analysis are usually evident to the critical reader, and knowledgeable readers can recognize failures to adequately review a literature or to fairly summarize it. Moreover, much normative writing does not turn on the “truth” of disputed propositions, but rather invokes values that are neither true nor false or depends on readings of cases or statutes that are similarly neither true nor false.
Empirical research too may benefit from the passionate involvement of the researcher in a topic, even when involvement reflects not only value commitments but also strong a priori views about the state of the empirical world. One who opposes the death penalty, for example, may oppose it in part for religious reasons and in part because she believes that it is imposed in a racially biased manner. To demonstrate the latter proposition, she may undertake a study of the role race plays in the administration of the death penalty. In doing so, however, she may be using data that is, for the time being at least, exclusively hers, and only those aspects of the data that she discloses will be known to readers. While a report of this research may be criticized for failing to use appropriate or best methods, a detailed critique will usually take considerable effort even for those familiar with the topic, and without the author’s data aspects of reported analyses must be taken on faith.
Empirical research differs from doctrinal or normative research in that empirical studies seek to reveal a truth “out there,” even if the truth is a narrow and highly conditioned one. In ethical empirical research, getting at this truth must always trump researcher preferences. Models and assumptions should be deployed so as to put one’s preferences to hard tests. Even then preferences and preconceptions may inescapably color research. Models are generated based on views of what matters, and if a person does not believe something matters, she may not think to account for it in a model, particularly if collecting the data needed to include a variable will be a difficult task. This is no sin, for life is short, reliable data are often hard to come by and building any model requires choices on what to include. These choices are typically based on hypotheses about what matter. The danger is that strong prior hypotheses may lead to poor model choices. So long as these hypotheses are based on “best guess” views of how the world works uninfluenced by preferred outcomes, error in model building choices should on average have no systematic effects on findings. If, however, value preferences rather than sound social science judgments determine model building choices, results will be systematically distorted.
Strong value preferences also threaten sound empirical research because of what might be called the IRS problem. In reviewing taxpayer prepared returns, the IRS reputedly finds that arithmetical errors that one might expect to be random, like errors in addition, disproportionately favor the taxpayer. One explanation for this is that tax payers intentionally err to lower their taxes and boost their refunds. But cheating need not be hypothesized to explain the skew. Honest taxpayers often have an idea of what they owe. If their tax liability is at the high end or higher than what they expected to pay, they will double check their returns and, most likely, catch addition errors. If, on the other hand, their calculated total is less than expected, they are likely, without more, to celebrate and send their returns in. Similarly an honest empirical researcher may nonetheless scrutinize results that support a favored hypothesis less closely than results that run counter to one because favorable results will seem correct and be welcome. The stronger a researcher’s value commitment, the more likely such behavior may be. Like so-called publication bias, which reflects the tendency to publish significant findings more often than insignificant ones, the natural human tendency to disproportionately double check unwelcome results can skew the portrait gained from empirical work.
I have looked empirically at a number of topics during my career. For the most part, I had the researcher’s usual investment in seeing a favored hypothesis confirmed, but had no relevant strong value preferences. However, in two areas I have investigated I began with a research-relevant position rooted not just in social science theory but also in deeply held values. One topic area is the death penalty; the other is affirmative action. I have been an opponent of the death penalty since I was a teenager and wondered how the intentional taking of a human life could be justified and what the State of New Yorkwould gain from an execution. The other area is affirmative action. I have a deep commitment to equality, especially racial equality. I see affirmative action as a route to this end, justified for blacks by a history of slavery and segregation and by the legacy of this history - a legacy manifested in inferior starting points and continuing discrimination.
It is scary doing empirical research on a topic one truly cares about. Suppose the results are unwelcome? I recall one study I did exploring the deterrent effect of capital punishment. Using a crude difference of differences method, I was testing Isaac Ehrlich’s prediction that each execution saved 8 lives. The first test I conducted was, if I recall correctly, an Ohio versus Michigan comparison. My results indicated that each execution seemed to save 8 lives! Fortunately for me, results of other comparisons varied greatly, and the analysis as a whole provided no reason to think executions deterred homicides. Thus I did not have to confront the question of what to do with results that undermined a strong value commitment. I know what I would not have done. I would not have fudged the data nor published anything that was likely to mislead. But I also might not have published my results at all. My objections to capital punishment were at root not empirical, at least not in ways that data on deterrence were likely to resolve. If supporters of capital punishment were going to be able to make a strong case for this institution based on deterrence data, I might have decided to let them make that case based on someone else’s research. As a social scientist I am not proud of the fact that I would have considered withholding empirical results I trusted because I feared their political consequences. But I am not just a social scientist. As a person with strong feelings about the fairness of capital punishment, the possible execution of the innocent and the general sanctity of human life, I am not ashamed about my hesitation to give empirical ammunition to people whose arguments would not address my concerns but might exacerbate them.
Research on affirmative action does not pose quite the same dilemma. My support for affirmative action does not rest on values intrinsic to affirmative action. Apart from the equality it aims to foster, I do not view affirmative action as good or bad. Rather I see it as a means to the end of a racially less divided and more equal society. If sound empirical research were to show that affirmative action backfired – if Richard Sander’s Stanford Law Review claim that without affirmative action we would have more rather than fewer successful black lawyers held up to close scrutiny – I would rethink my support. Nevertheless, based on what I have seen while teaching at Michigan and on what common sense tells me, it would take considerable evidence to persuade me that affirmative action at schools like Michigan harmed the students who were its intended beneficiaries or retarded equality more than it advanced it. This perspective is not for me peculiar to affirmative action. It is Bayesian perspective that I take to hypothesis testing in general.
In the major study Terry Adams, David Chambers and I did of career outcomes among Michigan Law School's minority graduates, we never had to confront results that called our support for affirmative action into question. Indeed, our results indicated that Michigan's affirmative action admittees did better after graduating than we had expected. However, in doing this study we knew that as supporters of affirmative action our analysis might be affected by our values and even if it was not, others might think it had been. Here having co-authors, even like-minded co-authors, helped, for we continually challenged each other’s assumptions, and together we challenged the data. At the outset, for example, we recognized the possibility that our sample of respondents was biased in favor of finding that minorities were successful since minorities responded at lower rates than whites, and those successful in and after law school, whether white or black, might be more likely to respond to our questionnaire than their less successful peers. We each thought of ways to test this possibility using data we had on how both respondents and non- respondents had performed in law school as well as data we could glean from Martindale and Hubbell, alumni records independent of our survey, and other published information about the careers of our non-respondents. Eventually we felt we could conclude with some confidence that while there might be a slight tendency among less successful whites and minorities to respond with a lower probability than more successful whites and minorities, the probability appeared slight enough that the different white and minority response rates should not affect any of our conclusions. Had we been writing in a less contentious area or been less conscious of the need to ensure that our values were not leading us to overlook flaws in our data, we might have devoted less attention to the possibility of non-response bias, perhaps, for example, not making the time-consuming effort to sort through legal directories to track the careers of non-respondents. There was, indeed, some tension in searching for response bias, for if we had found substantial bias it would have made our work more difficult and might have required us to discount evidence of minority success. Yet these tests had to be done, both in the interests of good social science and to forestall obvious and fair criticisms that readers might otherwise have of our results.
Another caution is that one must be careful not to overgeneralize from results. In the Michigan study, for example, within the limits of our available measures, it appeared that Michigan’s minority graduates did about as well as the school’s white graduates in career earnings and satisfaction and that they did somewhat more service. Given our views of affirmative action, it would have been nice to have been able to argue that these outcomes were likely to hold for all law schools that engaged in affirmative action. We could not. Strictly speaking we could only say that our findings held for Michigan’s graduates, though we could reasonably argue that one would expect similar outcomes among the graduates of the handful of law schools similar to Michiganin terms of prestige, student quality and employment opportunities open to their graduates. We emphasized this limitation at several points in our discussion. Although cautions against overgeneralizing apply to all empirical research, restrained generalization is particularly important in dealing with sensitive, hot button issues like affirmative action. As a rule of thumb, the more sensitive a research topic is, the more deeply held one’s views on it are, and the more one would like findings to be true universally, the more important it is to clearly identify the limits of the population to which results may be fairly generalized. The press and advocates will want to push findings to an extreme. It is the researcher’s obligation to do what can be done to prevent research results from being oversold, even, and indeed especially, when one believes good might result from this happening.
Much of the empirical research in law reviews is, in my view, weak in this respect. Lawyers, even those who do empirical work, are often not trained in social science ethics. Rather social science is for them, as for an advocate in litigation, a tool that can be used to muster evidence in support of a position. Thus empirical results are too often pushed beyond the narrow confines of what a study’s results allow one to say, strictly speaking, and presented as a reason to maintain or change policies. Single studies are, however, almost always inadequate justifications for any policy recommendation, even when they are directly on point. Indeed, I would go so far as to say that scholars should refrain from presenting original empirical studies and strong policy advocacy in the same article. While the policy implications of research may and sometimes should be pointed out, it is better to leave determined advocacy to other pieces that can both review the range of what we know empirically and assess the values that should inform policy making. Moving from the presentation of a study’s results to policy invariably privileges that study’s results over the results of other research and may lead readers to question whether the policy is derived from the study or whether the study was somehow influenced by a desire to advocate policy.
Large data sets, with many variables and different possible variable codings, can allow an unscrupulous researcher to generate support for favored hypotheses by capitalizing on chance associations. Suppose, for example, that a survey researcher has asked lawyer respondents to report their career satisfaction on a seven point scale. She then might compare the satisfaction of white and black respondents by comparing mean satisfaction scores, median satisfaction scores, the proportion giving high scores, defined in various ways, the proportion giving low scores, etc. Her choice of what measure to use should then turn on an a priori judgment as to which will best capture the phenomenon she seeks to describe. Medians and means, for example, are different ways to capture central tendencies, while proportions giving scores of 1 or 7 would indicate concentration at the extremes. If a measure showed a statistically significant difference across races, the researcher might conclude that law graduates of one race tended to be more satisfied (dissatisfied) than members of the other race, depending on the measure used. However, if the researcher had explored various ways of parsing the satisfaction variable and found that only one of these ways indicated that black and white lawyers differed significantly, it would be misleading to report only the coding that yielded significant differences. One would most likely not have found a significant difference at all, for the more different ways one looks at the same data, the more likely one is to find a relationship that appears significant by chance alone.
Ransacking data looking for relationships that will support favored hypotheses is not an appropriate way to proceed in any study, but it is particularly inappropriate the more invested one is in reaching a particular conclusion. To avoid unfair (or worse yet, fair) suspicion, researchers, particularly when examining issues they care deeply about, should specify before looking closely at the data, the variables and relationships that will support their hypotheses, or they should employ techniques, like factor analysis, to statistically derive appropriate measures. Exploratory data analysis may be necessary to determine appropriate modeling techniques and whether particular assumptions are met, and it can also be used to generate hypotheses for independent testing, but it should not be used to identify approaches to data likely to support a favored hypotheses.
This does not mean that exploring alternative approaches to data analysis is a mistake. What is crucial is how it is done. One often wants to know how sensitive an analysis is to model specification, variable definition, decisions whether to impute missing data, and the like. If, for example, groups of black and white lawyers did not differ on median career satisfaction but differed significantly if one looked at the proportions deeply dissatisfied, this would be potentially an important observation and readers should know that considering only median satisfaction might hide something important that was going on. The converse would be true if extreme dissatisfaction were the original variable of interest. But the finding on extreme dissatisfaction might be close to meaningless if it did not emerge in alternative definitions of extreme dissatisfaction and there were no differences between the races at other satisfaction levels. Hence findings of sensitivity tests should generally all be disclosed, even if further exploration of some is justified.
While I have used an example of variable definition for simplicity’s sake, more often sensitivity testing is used to test the robustness of models to different specifications and possible violation of the assumptions they incorporate. For an excellent example of how sensitivity tests are conducted and what they add to analysis, one might see the recent ASR article, “Best Practices or Best Guesses? Assessing the Efficacy of Corporate Affirmative Action and Diversity Policies” by Alexandra Kalev, Frank Dobbin and Erin Kelly. http://www.wjh.harvard.edu/~dobbin/cv/working_papers/aapracticesFinalProof.pdf
Let me conclude by calling attention to a special problem that confronts empirical researchers writing on affirmative action and other issues relating to the competence of minorities, especially blacks. Whether a person wishes it or not, if her published results call the competence of blacks into question, either absolutely or relative to whites, the study is likely to feed racism. This can be seen in on-line responses to the New York Times story reporting the data from Professor Sander’s NCLR article that I questioned in my first blog in this series. Some comments on the story provided anecdotal evidence or other support for Professor Sander’s conclusions and hypotheses. Other comments questioned Professor Sander’s conclusions, suggesting that the reportedly high rates of large law firm departure by blacks could be explained entirely by the hostile atmospheres that blacks experienced in large firms, choices to move to more attractive positions, etc. (Interestingly, I recall no commentator who seemed to have read the NCLR article, but many were willing to make assertions about the study’s validity, one way or the other, based on the brief and partial description in the Times.) Also in the mix, however, were several comments that I thought were genuinely racist, suggesting, for example, that Professor Sander’s findings were evidence for or resulted from the genetic inferiority of blacks.
What is one to do about this? The answer is not to refrain from publishing. Short of plans for an atom bomb or a doomsday machine, science should not be censored simply because it is likely to be misused or misunderstood. Nor, unfortunately, is the answer simply to make clear that certain interpretations should not be put on the data, although this often should be done. Professor Sander, for example, makes clear in his writing that he rejects all theories that posit a genetic basis for differences in how black and white law students fare, but these caveats were not picked up in most popular reports of his research, and even if they were those who wanted to ignore them no doubt would.
Still I think the danger of fueling racism places a special burden on scholars whose work might be misread to do this. Before publishing and going on to publicize their findings, scholars whose work appears vulnerable to racist (or similar) misuse and misinterpretation should have greater confidence in their data and the conclusions they draw from their analyses than one would demand of research without these negative potentials. But this advice only takes us so far. The element of Professor Sander’s Stanford Law Review article most likely to fuel racism was not the mismatch hypothesis for which he was responsible, but the Bar Passage Study data on the high failure rates of blacks relative to whites, both in law school and on the bar exam. Not only did Professor Sander bear no responsibility for what these data showed, but they deserve the attention they received, for they are important to the policy debate on affirmative action and to thinking about how best to improve the production of black attorneys. Sometimes we must live with the reality that we cannot control all the uses to which our research is put, even when we do the best we can.

Comments