« Blumenthal: Thanks for the Welcome | Main | Empirical Legal Studies and IRBs »

30 May 2007



Indeed, there's more that we agree about than disagree with. One small note: You mention that it's better to use effect sizes instead of p values -- but wouldn't you agree that it's exceedingly rare for original research to report effect sizes (irrespective of the supposed requirement by many professional organizations, such as APA, that effect sizes must be reported?)

Jeremy A. Blumenthal

I think we agree about more than is easily conveyed over blog posting. A few more thoughts:

First, although it’s true that more data can more easily lead to spurious findings of statistical significance, that’s exactly the reason MA focuses (should focus) on effect sizes rather than p-values. I’d go so far as to say I’m not sure combining p-values in a MA is useful, because, as you say, it’s so easy to come up with overall significant results across a large pool of studies. Rather, we should work with the effect sizes from those studies.

Second, what I meant about weighting is allowing certain studies to “count” more when, for instance, evaluating the average effect--based on features identified a priori. So, as a facile instance, if a meta-analyst is working with experimental studies, random assignment to condition is crucial. A study that used it would be rated high on that feature of study quality; one that didn’t would be rated low. Those studies that did use random assignment would be weighted more heavily in subsequent analyses—calculating a summary statistic in particular. The criteria for evaluating such quality should be made explicit, and, I think, it’s up to the “consumer” to assess whether the meta-analyst was fair. I actually think both weighted and unweighted analyses should be provided, and, ideally, not only should the studies a meta-analyst used be identified, but an Appendix or other source could be provided that includes the data used.

And I think you weren’t suggesting that the MA process is unethical, just the potential for it to be so used. This is, of course, a problem in all empirical (and even non-empirical) research. The transparency of providing objective inclusion and quality criteria, as well as even providing the data, IMHO, does as much as possible to address this.

Last, I do think a MA gives a better sense of what is known than individual studies. More important, I think, is my “common language” and “cherry-picking” point—if researchers (courts, agencies, practitioners) work from the same set of studies, it can be easier to avoid focusing on only the studies that help one side or the other.

I appreciate your helping me clarify my thoughts here.


Great post Jeremy. Before I respond, I just want to make it clear that I'm not trashing MA -- I think they can be great studies, but there a limitations.

First, I'm not sure more data is always better. As you know, the more data one has the more likely it is that a significant relationship will be found BY CHANCE. In the world of empirical research, quality is better than quantity. I've never heard of MA using wighted variables to allow "better" studies allowing for quantification of quality. Perhaps I've misunderstood your point. Nonetheless, that still gives the author a lot of leeway in determining the what is good v. bad quality.

And along that vein, I wasn't suggesting that authors are unethical when using MA to get results (although clearly the NEJM article was); rather, I was pointing out that a crucial step in conducting MA is that the author's must make a decision about what the inclusion criteria are -- and this is always problematic because no criteria are perfect. Of course, no empirical study is perfect either, but since few people actually read and critically think about the inclusion criteria, MA has the risk of presenting an issue as settled when it not be. Just my 2 cents.

Jeremy A. Blumenthal

These are good points, and certainly a couple of the more oft-levied concerns about MA. One facile response--though, I think, nevertheless accurate--is that these problems are even worse with traditional narrative reviews. More narrowly, though:

On the GIGO concern: first, I think that as a general proposition when reviewing the state of research, having more data is better than having less, for several reasons: increasing power in the MA's subsequent analyses; reporting most accurately what is known in an area; not "wasting" data; avoiding (in part) criticisms of selection bias. . . . Second, I'm not sure I agree that a MA can't control for poor methodology, in this sense: again, an advantage of MA is the opportunity to look at what variables of the primary studies are correlated with their outcomes. A meta-analyst can quantify study quality as one factor that might affect the results. Finding that there is a relationship allows quantification of that influence, and also gives justification for weighting higher-quality studies more heavily in subsequent analyses. (Of course, finding that there is NO relationship between study quality and observed effect size is helpful as well; there is some evidence that in the typical MA, there is little such relationship.)

That approach gets at the second issue you raise, a selection concern. First, of course, although I understand your point about the authors' ethics--and that will apply to primary researchers and meta-analysts--again, dealing with such behavior is not limited to MA; any secondary user will have to decide how far to go in evaluating whether primary research (empirical or not) was conducted appropriately. I would respond to your substantive point about variation in study quality and approaches, though, as above, encouraging a meta-analyst to use study quality as a moderator variable.

How? There are at least two ways. One traditional approach is for coders to evaluate a study's methodology and weight each study by that rating. This has been criticized as introducing overly subjective factors, though I think that having multiple coders and reliability checks addresses that. A more objective approach is to develop an a priori checklist of what makes a good study (e.g., random assignment in an experiment), then rate the studies on each checklist feature. In both cases we can see whether the features correlate with the study's outcome.

Of course, in any such approach--and, when reporting what studies were included or excluded or any such decision or judgment a meta-analyst makes--the author should report the basis for those ratings and decisions.

Thanks for raising these issues--I don't mean that MA is a panacea, but I do think it's a very valuable but under-used methodological tool.


Prof. Blumenthal,

I'd be curious about your opinions of the limitations of meta-analysis. As you probably know, the prevailing criticisms of this method center around two issues.

First, the results of meta-analysis are only as good as the quality of the original research. The notion of "garbage in, garbage out" is a noted weakness of this method. Since meta-analysis takes reported outcome values from original research to arrive at an effect size, it's quite easy for a meta-analysis to present inflated or invalid estimates. An original research article may do a poor job with its own methodology but arrive at impressive results. A subsequent meta-analysis has no way of controlling for the poor methodology from the original study.

Second, another major criticism revolves around inclusion criteria. How does one decide which studies to include. Most studies have substantial disparities in terms of sample characteristics, analysis, and variable construct validity. Authors of meta-analysis articles have wide discretion in determining which studies to include and which to exclude and such decisions can have profound effects on the subsequent effect sizes. Just look at the Vioxx article in the New England J. of Medicine. It later came out that the authors of that study removed a study in order for the effect size to reach statistical significance (discovered by embedded data in the word document). These seems like major limitations for meta-analysis.

The comments to this entry are closed.


June 2022

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Site Meter

Creative Commons License

  • Creative Commons License
Blog powered by Typepad