How to impute a two-level model with repeated measures
Posted: Fri Feb 17, 2017 3:51 am
Dear all,
My question is how to organize data in order to impute a two-level model with 5 repeated measures.
To sum up: We have been using Stata and RealComImpute. Level 1 = patient id, Level 2 = month.
Month is a repeated measure taken from the same patient at month 0 (baseline), 1, ..., 4.
Letting i denote the i-th month (i = 0,...,4) and j = 1,...,n participants, our model is like:
y_ij = β_0 + β_1month_ij + β_2gender_j + ....+ u_0j + u_1j month_ ij + e_ ij
We have missing data for i = 3 and i = 4 only (~10% and 90% respectively). Data for the remaining months are complete.
y from month 2 is highly correlated with y from month 3 (r>0.90), y from month 2 is also highly correlated with y from month 4 (apparently - with r>0.70).
So, we would like to use y from month 2 as one of the predictors for y month 3 and 4.
In order to use y from month 2 as predictor, I created (in Stata) an additional column called pred_2, which contains y from month 2 for every subject.
In Stata, the final dataset would look like this:
*/------------------- start -----------------------------
clear
set seed 12345
set obs 1000
gene id = _n
gene age = round(runiform()*30)+20
gene gender = round(runiform())
gene covariate1 = runiform()
gene covariate2 = runiform()
forvalues i = 0/4 {
gene y`i' = round(rnormal(100,20))
}
replace y3 = . if runiform()<0.10
replace y4 = . if runiform()<0.90
gene pred_2 = y2
reshape long y, i(id) j(month)
gene cons = 1
sort month id
order id month y pred_2
realcomImpute y age gender covariate1 covariate2 pred_2 using mydata , numresponses(1) cons(cons) level2id(month)
*/------------------- end -----------------------------
Is this the correct set up for the two-level imputation?
Look forward to hearing from you.
Tiago
My question is how to organize data in order to impute a two-level model with 5 repeated measures.
To sum up: We have been using Stata and RealComImpute. Level 1 = patient id, Level 2 = month.
Month is a repeated measure taken from the same patient at month 0 (baseline), 1, ..., 4.
Letting i denote the i-th month (i = 0,...,4) and j = 1,...,n participants, our model is like:
y_ij = β_0 + β_1month_ij + β_2gender_j + ....+ u_0j + u_1j month_ ij + e_ ij
We have missing data for i = 3 and i = 4 only (~10% and 90% respectively). Data for the remaining months are complete.
y from month 2 is highly correlated with y from month 3 (r>0.90), y from month 2 is also highly correlated with y from month 4 (apparently - with r>0.70).
So, we would like to use y from month 2 as one of the predictors for y month 3 and 4.
In order to use y from month 2 as predictor, I created (in Stata) an additional column called pred_2, which contains y from month 2 for every subject.
In Stata, the final dataset would look like this:
*/------------------- start -----------------------------
clear
set seed 12345
set obs 1000
gene id = _n
gene age = round(runiform()*30)+20
gene gender = round(runiform())
gene covariate1 = runiform()
gene covariate2 = runiform()
forvalues i = 0/4 {
gene y`i' = round(rnormal(100,20))
}
replace y3 = . if runiform()<0.10
replace y4 = . if runiform()<0.90
gene pred_2 = y2
reshape long y, i(id) j(month)
gene cons = 1
sort month id
order id month y pred_2
realcomImpute y age gender covariate1 covariate2 pred_2 using mydata , numresponses(1) cons(cons) level2id(month)
*/------------------- end -----------------------------
Is this the correct set up for the two-level imputation?
Look forward to hearing from you.
Tiago