imputation of ordered categorical level-2 variables
Posted: Thu Oct 06, 2011 2:59 pm
Hi, I seem to be having a problem when trying to impute ordinal level-2 variables. Within each imputation either all missing observations are assigned the lowest value (1) or (less commonly) only the highest value is imputed. This does not seem to be depend on explanatory variables or correlation with other responses. Also the issue does not arise when imputing the same variable as if it was an unordered categorical level-2 variable.
Thanks in advance!
Edit: I was thinking that it would probably help if I provide an example for others to replicate. Perhaps this is just a silly mistake I did. I am using the example dataset for the realcomImpute command in Stata ("prac2full.dta") which can be downloaded here (http://missingdata.lshtm.ac.uk/examplea ... eStata.zip). I generate a random level-2 variable which I divide into quartiles and delete 10% completely at random. I was using the default imputation settings for the below example but from what I recall, changing them does not seem to make a difference.
_mi_m
quartile 0 1 2 3 4 5 6 ...
1 1,100 1,566 1,566 1,566 1,566 1,566 1,566 ....
2 1,147 1,147 1,147 1,147 1,147 1,147 1,147 ...
3 1,114 1,114 1,114 1,114 1,114 1,114 1,114 ...
4 1,046 1,046 1,046 1,046 1,046 1,046 1,046 ...
Total 4,407 4,873 4,873 4,873 4,873 4,873 4,873 ...
(Sorry about the formatting of this table but the rows correspond to the quartiles and the columns are the imputations)
Thanks in advance!
Edit: I was thinking that it would probably help if I provide an example for others to replicate. Perhaps this is just a silly mistake I did. I am using the example dataset for the realcomImpute command in Stata ("prac2full.dta") which can be downloaded here (http://missingdata.lshtm.ac.uk/examplea ... eStata.zip). I generate a random level-2 variable which I divide into quartiles and delete 10% completely at random. I was using the default imputation settings for the below example but from what I recall, changing them does not seem to make a difference.
Code: Select all
gen r=invnormal(uniform())
bys school: egen randomlevel2variable=mean(r)
gen quartile=.
quietly sum randomlevel2variable, d
replace quartile=1 if randomlevel2variable<r(p25)
replace quartile=2 if randomlevel2variable<r(p50) & quartile==.
replace quartile=3 if randomlevel2variable<r(p75) & quartile==.
replace quartile=4 if quartile==.
gen s=uniform()
bys school: egen mmissing=mean(s)
quietly sum mmissing, d
replace quartile=. if mmissing<r(p10)
drop r s randomlevel2variable mmissing
sort school
realcomImpute nlitpre o.quartile nlitpost fsmn gend using prac2fullMIInput.dat, replace numresponses(2) level2id(school) cons(cons)
*** after imputation ****
realcomImputeLoad
mi convert flong, clear
tab quartile _mi_m
_mi_m
quartile 0 1 2 3 4 5 6 ...
1 1,100 1,566 1,566 1,566 1,566 1,566 1,566 ....
2 1,147 1,147 1,147 1,147 1,147 1,147 1,147 ...
3 1,114 1,114 1,114 1,114 1,114 1,114 1,114 ...
4 1,046 1,046 1,046 1,046 1,046 1,046 1,046 ...
Total 4,407 4,873 4,873 4,873 4,873 4,873 4,873 ...
(Sorry about the formatting of this table but the rows correspond to the quartiles and the columns are the imputations)