In general, a surprising number of researchers fail to develop and execute a strategy for handling "duplicate" entries in a data set. For example, in assessing criminal outcomes researchers need to decide ex ante whether their dependent variable of interest is the "crime" or the "criminal." If it's the latter, due to recidivism, researchers then need to think through how to handle the possibility that a single individual may commit multiple, independent crimes. If so, a single criminal may appear more than once in a data set and, in so doing, raise potential "double count" issues. (Obviously, one may plausibly treat multiple crimes committed by the same individual as separate events and intentionally "double count.") In the event that the researcher wants to avoid "double counting," however, identifying and rooting out duplicate cases in large data sets can prove difficult. To this end, a nice (albeit abbreviated) discussion of how to use Stata's "duplicate" command is found here.
Comments