While questions surrounding the potential for systematic measurement error from MTurk samples cannot be easily resolved in a single paper, given the popularity of MTurk samples in many fields, notably psychology, careful attention scholarly is warranted. The initial "warning" about MTurk samples came in 2013 from Dan Kahan (Yale), particularly for studies seeking to test “hypotheses about cognition and political conflict over societal risks and other policy-relevant facts.” Since then we've seen a developing literature that sets out to test potential limitations to MTurk samples. A recent paper by Douglas Ahler (FSU-Poli Sci ) et al., The Micro-Task Market for Lemons: Data Quality on Amazon’s Mechanical Turk, focuses on the potential for measurement error, especially Type II errors, injected by, among other factors, the "presence of 'non-respondents' (bots) or non-serious respondents on the platform." The paper's core findings include:
"While we find no evidence of a 'bot epidemic,' we do find that a significant portion of survey respondents engaged in suspicious behavior. About 20% of respondents either circumvented location requirements or took the survey multiple times. In addition, at least 5-7% of participants likely engaged in 'trolling' or satisficing. Altogether, we find about a quarter of data collected on MTurk is potentially untrustworthy. Expectedly, we find response quality impacts experimental treatments. On average, low quality responses attenuate treatment effects by approximately 9%. We conclude by providing recommendations for collecting data on MTurk."
Comments