Wednesday, March 15, 2017

dummy variable in regression

A dummy variable is a variable for which all cases falling into a specific category assume the value of 1 and all cases not falling into that category assume a value of zero.

Coding convention: 0 for the value that does not have the characteristic and 1 for the value that has the characteristic. e.g., My study focuses on female. Thus, male=0, female= 1

e.g, if the Income variable has four categories

You will end up having 4 dummy variables-- income100dollar(代號A),income200dollar(代號B),income300dollar(代號C),income400dollar(代號D) in the data coded as follows:
coding book
ABCD
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

When you use this variable in a regression analysis, the dummy variable for reference category is omitted (some softwares will do that automatically for you).
e.g, I want to study female (code 1). Thus, male is code 0 (reference category)

One category, usually the one which contains the highest number of respondents, is designated as the 'reference' category (code 0) and does not have a dummy variable. All other categories have a dummy variable created for them. Participants are coded '1' if they belong to the particular category of each dummy variable and '0' if not. Participants who belong to the reference category are coded as '0' for all dummy variables.

The coefficients (B) for each of these new variables tell us how much difference in the outcome is predicted for a member of that category relative to members of the reference group. 

No comments: