For many research projects scholars have to manipulate existing raw data into more useful derived variables. One obvious approach is "hand-coding," or some related form of Excel file manipulation. That is fine for folks experienced with Excel, but for others (including experienced Excel users) such a task typically invites some degree of human error. Another approach is to rely on Stata coding for variable creation. A recent (and, I hope, building conversation on the State List) illustrates a not an atypical example.
The questioner begins with Current Population Survey data (from the Census Dept.) that include individual-level information on, e.g., year, state, and a dummy variable for employment status (0=employed, 1 = unemployed, 2 = not in labor force). What the CPS data set does not include, however, is a state-level unemployment rate by year and state variable. So, the question is how to most efficiently extract what is desired (the creation of a new state-level unemployment rate by year and state variable) from what exists in raw data form. The discussion thus far has elicited at least one very helpful step-by-step examples of how to accomplish this goal with three lines of code (click here).
To be sure, other options also exist. That said, clear explications of how to move from what one has in terms of data to what one wants in terms of a new variable are sometimes hard to find.
Recent Comments