I'm seeing increased discussion of the comparative advantages (and disadvantages) of "Bayesian versus 'frequentist'" approaches toward inference. A recent post by Andrew Gelman (here), and the comments it triggered, illustrate.
A recent Statalist discussion (here) includes a helpful illustration of how to execute and understand margins output (as well as common mistakes). As well, Richard Williams (Notre Dame--Sociology) hosts a webpage that includes a rich array of resources, including an extremely helpful PowerPoint discussion of the margins command (see "Selected other highlights" at the bottom of the page).
A fascinating paper exploring the effect of partition on violence does a wonderful job illustrating how creative efforts (and rich data) can potentially dampen (though not eliminate) endogeneity concerns. In Which Side Are You On? Political Violence and Partition in Ireland 1920-1921, Elissa Berwick (MIT--Poli Sci) analyzes a quasi-natural experiment produced by the partition of Ireland in 1921. Exploiting a truly unique data set, Berwick finds that "although partition decreased violence against civilians on Northern Ireland's side of the border as compared to the Irish Free State side, violence against civilians in the border areas as a whole significantly increased."
Along with her interesting findings Berwick's research design also warrants note, especially on how it addresses endogeneity concerns. As Berwick concedes, "Partition is usually provoked by conflict, and yet its effect on conflict is the outcome of interest." Notwithstanding this tension, however, Berwick goes on to observe that:
"As for endogeneity, the initial proposal of partition came from a backbench Liberal in June 1914, years before the start of any violent civil conflict. Although the partition itself occurred in a context of civil war, its original justification was not to end conflict between northern Protestants and southern Catholics. Instead, British legislators were concerned by their inability to coerce Northern Unionists, a point driven home by the Curragh Mutiny of March 1914, in which the British Army refused to disarm the Ulster Volunteer militia. Thus the partition of Ireland was not intended to separate two sides, but instead to forestall action by a minority."
While whether Berwick's methodological optimism is warranted, of course, requires further study, her transparent and helpful discussion of the issue deserves praise.
Andrew Gelman (Columbia--Statistics) notes (here) that among statistics' three essential elements, "measurement, comparison, and variation," measurement receives short shrift. Why?
"Part of it is surely that measurement takes effort, and we have other demands on our time. But it’s more than that. I think a large part is that we don’t carefully think about evaluation as a measurement issue and we’re not clear on what we want students to learn and how we can measure this." To this I would add one additional practical aspect. For those conducting secondary analyses of data sets put together by others, most typically defer to measurement decisions already baked into data sets.
Regardless, when measurement goes awry, measurement error emerges and bad things happen.
Over at Concurring Opinions Dave Hoffman (Temple) wonders whether legal empiricists need to broaden their traditional approach when it comes to significance testing. Hoffman's take is that "... given what’s happening in cognate disciplines, it might be time for law professors to get comfortable with a new way of evaluating empirical work."
The presence of "too many zeros" is a common challenge in empirical legal research. For example, "most" cases do not pursue appeals, the outcome of many civil trials (e.g., a finding of no liability) generates "zero" damages, etc. Thus, the distributions of outcome variables of interest are not infrequently skewed and this data skew warrants attention.
"Tobit models are often applied to deal with the excess number of zeros, but these are more appropriate in cases of true censoring (e.g., when all negative values are recorded as zeros) and less appropriate when zeros are in fact often observed as the amount awarded. Heckman selection models are another methodology that is applied in this setting, yet they were developed for potential outcomes rather than actual ones. Two‐part models account for actual outcomes and avoid the collinearity problems that often attend selection models. A two‐part hierarchical model is developed here that accounts for both the skewed, zero‐inflated nature of damages data and the fact that punitive damage awards may be correlated within case type, jurisdiction, or time. Inference is conducted using a Markov chain Monte Carlo sampling scheme. Tobit models, selection models, and two‐part models are fit to two punitive damage awards data sets and the results are compared. We illustrate that the nonsignificance of coefficients in a selection model can be a consequence of collinearity, whereas that does not occur with two‐part models."
Not infrequently, particularly in psychology/sociology where survey instruments are common, a question arises about whether a 5- or 7-point Likert-scale variable can serve as a dependent variable in a regression specification. As this listserv discussion illustrates (here), one response is that such a question is no longer necessary with the ordered probit/logit command ('oprobit' and 'ologit'), assuming certain assumptions hold (namely, that the various options rank order coherently).
Just came across a neat user-created Stata command, mcp, that can transform marginal effects' results into intuitive graphs. An extremely helpful description (and brief tutorial) is found here. A more technical description from the Stata manual follows.
"marginscontplot [mcp] provides a graph of the marginal effect of a continuous predictor on the response variable in the most recently fit regression model. When only xvar1 is provided, the plot of marginal effects is univariate at values of xvar1 specified by the at1() or var1() option. When both xvar1 and xvar2 are provided, the plot of marginal effects is against values of xvar1 specified by the at1() or var1() option for fixed values of xvar2 specified by the at2() or var2() option. A line is plotted for each specified value of xvar2.
marginscontplot has the distinctive ability to plot marginal effects on the original scale of xvar1 or xvar2, even when the model includes transformed values of xvar1 or xvar2 but does not include xvar1 or xvar2 themselves. Such a situation arises in models involving simple transformations such as logs and more complicated transformations such as fractional polynomials or splines, for example, where nonlinear relationships with continuous predictors are to be approximated. Transformed covariates are included in the model to achieve this."
While not necessarily squarely on point to all ELS Blog readers, a general discussion of social science ethics, re-ignited recently by an incident/study involving political scientists at Stanford and Dartmouth (for a description, click here) remains germane to many ELS scholars as well. In a recent post Andrew Gelman (Columbia--statistics) discusses suggestions by Macartan Humphreys (Columbia--Poli Sci) on how to think through the ethical dimensions incident to social science research in the field.
Yanna Krupnikov (Stony Brook-Poli Sci) and Adam Seth Levine’s (Cornell-Govt) article, “Cross-Sample Comparisons and External Validity,” offers an interesting (and empirically welcome) contribution to the ongoing debate regarding the use of convenience samples drawn from MTurk. The takeaway, at least in part, is as the authors note in their conclusion: “Our results do serve to sound a note of caution when using MTurk to produce generalizable results for all but the simplest experimental designs.”
In an interesting review essay, Andrew Gelman (Columbia) discusses the all-important move from describing what to assessing what if or why. Obviously, inference is more complex than description. Gelman considers two broad classes of inferential questions. One involves "forward causal inference." That is, what might happen if we do X? A second is "reverse causal inference," or what causes Y? Equally interesting are Gelman's thoughts on what characterizes persuasive inferential claims.
“The most compelling causal studies have (i) a simple structure that you can see through to the data and the phenomenon under study, (ii) no obvious plausible source of major bias, (iii) serious efforts to detect plausible biases, efforts that have come to naught, and (iv) insensitivity to small and moderate biases (see, e.g., Greenland 2005). Two large unresolved problems are, first, how to best achieve these four steps in practice and, second, what sorts of causal claims to make in settings where we are not able to satisfy these conditions.”
A recent post on the Stata list involving law-related data (here, patent litigation) struggles with selection-effect questions. Those familiar with law- and litigation-related data understand why selection issues are almost always present in litigation data. (And those unfamiliar with law too often do not.)
Stata graphics can be complex and frustrating. One small point, common in legal research, involves graphics options for categorical variables. A recent listserv discussion isolates the question and includes a solution with helpful instructions (here). (Please note that the preferred solution requires a user-installed package ("catplot").)
An interesting thread, stimulated by a manuscript reviewer's confusion with a paper's conflating "correlation" with the slope in a regression estimation. The posted question prompted a very helpful response that, among other things, deftly explains ANOVA's presence in standard regression output.
"Analysis of variance (ANOVA) is just a technique comparing the variance explained by the model versus the variance not explained by the model. Since regression models have both the explained and unexplained component, it's natural that ANOVA can be applied to them. In many software packages, ANOVA results are routinely reported with linear regression."