Dear All,
What a nice site-useful and has a lot of immediate references (code) that one can use.
A few years ago I did a fairly uncomplicated multilevel multivariate model where the 3 outcomes were measured at the same level (individual student level subject scores). Modelling these responses presented no problem. However, I am now trying to model school resources (a level 2 variable) jointly with academic outcome but I am having problems.
Now, I understand that there is no level 1 variation for a level 2 variable and I was trying to use the "elements" option to tell runmlwin to set the level 1 variance and co-variances to zero. In spite of my best efforts, I still get a message that states the log likelihood could not be calculated and there is something wrong with my model. I am sure I am doing something wrong but can't figure what to do about it.
Has anyone done this and has any help or further suggestions for me?
Many thanks,
Russell Wildeman
Multivariate multilevel models
-
- Site Admin
- Posts: 432
- Joined: Fri Apr 01, 2011 2:14 pm
Re: Multivariate multilevel models
Great question,
Yes I think you can do this (at least when both response variables are continuous responses).
Below is some example code
Best wishes
George
Commands...
Output...
Yes I think you can do this (at least when both response variables are continuous responses).
Below is some example code
Best wishes
George
Commands...
Code: Select all
* Clear data and set seed
clear
set seed 131235
* Set number of schools to 100
set obs 100
generate school = _n
* Generate school-level random intercept-effects for the two response variables
matrix C = (1,0.5 \ 0.5, 2)
drawnorm u1 u2, cov(C)
* Set number of students per school to 10
expand 10
sort school
generate student = _n
* Generate the pupil-level residuals for the pupil-level response varible
generate e1 = rnormal()
* Generate the pupil-level response variable
generate y1 = 5 + u1 + e1
* Generate the school-level response variable
generate y2 = 0 + u2
* Examine data for the first school
list if school==1, sepby(school)
* Reshape the data from wide to long form so that both responses in one variable
bysort school (student): replace y2 = . if _n!=1
reshape long y u, i(school student) j(response)
drop if y==.
order school response student
sort school response student
* Set student ID to 0 for school-level response
replace student = 0 if response==2
* Examine data for the first school
list if school==1, sepby(school)
* Generate dummy variables to indicate the two different response equations
tabulate response, gen(r)
* Fit model to pupil-level response variable
runmlwin y r1 if r1==1, level2(school: r1) level1(student: r1) nopause
* Fit model to school-level response variable
runmlwin y r2 if r2==1, level1(school: r2) nopause
* Fit model to both responses allowing for a school-level correlation
runmlwin y r1 r2, level2(school: r1 r2) level1(student: r1 r2) nopause
Code: Select all
. * Clear data and set seed
. clear
. set seed 131235
.
. * Set number of schools to 100
. set obs 100
obs was 0, now 100
. generate school = _n
.
. * Generate school-level random intercept-effects for the two response variables
. matrix C = (1,0.5 \ 0.5, 2)
. drawnorm u1 u2, cov(C)
.
. * Set number of students per school to 10
. expand 10
(900 observations created)
. sort school
. generate student = _n
.
. * Generate the pupil-level residuals for the pupil-level response varible
. generate e1 = rnormal()
.
. * Generate the pupil-level response variable
. generate y1 = 5 + u1 + e1
.
. * Generate the school-level response variable
. generate y2 = 0 + u2
.
. * Examine data for the first school
. list if school==1, sepby(school)
+-----------------------------------------------------------------------------+
| school u1 u2 student e1 y1 y2 |
|-----------------------------------------------------------------------------|
1. | 1 -.8199924 -1.223277 1 -1.226586 2.953421 -1.223277 |
2. | 1 -.8199924 -1.223277 2 .039657 4.219665 -1.223277 |
3. | 1 -.8199924 -1.223277 3 -.2504696 3.929538 -1.223277 |
4. | 1 -.8199924 -1.223277 4 -.5030173 3.67699 -1.223277 |
5. | 1 -.8199924 -1.223277 5 1.739387 5.919394 -1.223277 |
6. | 1 -.8199924 -1.223277 6 -.0061893 4.173818 -1.223277 |
7. | 1 -.8199924 -1.223277 7 -.2211102 3.958897 -1.223277 |
8. | 1 -.8199924 -1.223277 8 2.776873 6.956881 -1.223277 |
9. | 1 -.8199924 -1.223277 9 1.336618 5.516625 -1.223277 |
10. | 1 -.8199924 -1.223277 10 3.091241 7.271249 -1.223277 |
+-----------------------------------------------------------------------------+
.
. * Reshape the data from wide to long form so that both responses in one variable
. bysort school (student): replace y2 = . if _n!=1
(900 real changes made, 900 to missing)
. reshape long y u, i(school student) j(response)
(note: j = 1 2)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 1000 -> 2000
Number of variables 7 -> 6
j variable (2 values) -> response
xij variables:
y1 y2 -> y
u1 u2 -> u
-----------------------------------------------------------------------------
. drop if y==.
(900 observations deleted)
. order school response student
. sort school response student
.
. * Set student ID to 0 for school-level response
. replace student = 0 if response==2
(100 real changes made)
.
. * Examine data for the first school
. list if school==1, sepby(school)
+-----------------------------------------------------------------+
| school response student u e1 y |
|-----------------------------------------------------------------|
1. | 1 1 1 -.8199924 -1.226586 2.953421 |
2. | 1 1 2 -.8199924 .039657 4.219665 |
3. | 1 1 3 -.8199924 -.2504696 3.929538 |
4. | 1 1 4 -.8199924 -.5030173 3.67699 |
5. | 1 1 5 -.8199924 1.739387 5.919394 |
6. | 1 1 6 -.8199924 -.0061893 4.173818 |
7. | 1 1 7 -.8199924 -.2211102 3.958897 |
8. | 1 1 8 -.8199924 2.776873 6.956881 |
9. | 1 1 9 -.8199924 1.336618 5.516625 |
10. | 1 1 10 -.8199924 3.091241 7.271249 |
11. | 1 2 0 -1.223277 -1.226586 -1.223277 |
+-----------------------------------------------------------------+
.
. * Generate dummy variables to indicate the two different response equations
. tabulate response, gen(r)
response | Freq. Percent Cum.
------------+-----------------------------------
1 | 1,000 90.91 90.91
2 | 100 9.09 100.00
------------+-----------------------------------
Total | 1,100 100.00
.
. * Fit model to pupil-level response variable
. runmlwin y r1 if r1==1, level2(school: r1) level1(student: r1) nopause
MLwiN 2.30 multilevel model Number of obs = 1000
Normal response model
Estimation algorithm: IGLS
-----------------------------------------------------------
| No. of Observations per Group
Level Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
school | 100 10 10.0 10
-----------------------------------------------------------
Run time (seconds) = 1.29
Number of iterations = 2
Log likelihood = -1528.6796
Deviance = 3057.3591
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
r1 | 4.924213 .1052099 46.80 0.000 4.718005 5.130421
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school |
var(r1) | 1.009213 .1566091 .7022644 1.316161
-----------------------------+------------------------------------------------
Level 1: student |
var(r1) | .9770083 .0460566 .886739 1.067278
------------------------------------------------------------------------------
.
. * Fit model to school-level response variable
. runmlwin y r2 if r2==1, level1(school: r2) nopause
MLwiN 2.30 multilevel model Number of obs = 100
Normal response model
Estimation algorithm: IGLS
Run time (seconds) = 1.07
Number of iterations = 2
Log likelihood = -183.48674
Deviance = 366.97348
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
r2 | -.4249741 .1515778 -2.80 0.005 -.7220612 -.1278871
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 1: school |
var(r2) | 2.297583 .3249273 1.660737 2.934429
------------------------------------------------------------------------------
.
. * Fit model to both responses allowing for a school-level correlation
. runmlwin y r1 r2, level2(school: r1 r2) level1(student: r1 r2) nopause
MLwiN 2.30 multilevel model Number of obs = 1100
Normal response model
Estimation algorithm: IGLS
-----------------------------------------------------------
| No. of Observations per Group
Level Variable | Groups Minimum Average Maximum
----------------+------------------------------------------
school | 100 11 11.0 11
-----------------------------------------------------------
Run time (seconds) = 1.32
Number of iterations = 2
Log likelihood = -1704.0128
Deviance = 3408.0256
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
r1 | 4.924213 .1052099 46.80 0.000 4.718006 5.130421
r2 | -.4249741 .1515778 -2.80 0.005 -.7220611 -.127887
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school |
var(r1) | 1.009213 .1566089 .7022652 1.316161
cov(r1,r2) | .6186087 .1710527 .2833517 .9538658
var(r2) | 2.297583 .3249273 1.660738 2.934429
-----------------------------+------------------------------------------------
Level 1: student |
var(r1) | .9770084 .0460566 .8867391 1.067278
cov(r1,r2) | 0 0 0 0
var(r2) | 0 0 0 0
------------------------------------------------------------------------------
-
- Posts: 6
- Joined: Tue Apr 22, 2014 2:54 pm
Re: Multivariate multilevel models
Dear George,
This is immensely useful-thank you very much! I will be running this much later tonight.
Gosh, I tried today but I am so glad for this.
With gratitude,
Russell
This is immensely useful-thank you very much! I will be running this much later tonight.
Gosh, I tried today but I am so glad for this.
With gratitude,
Russell
-
- Posts: 6
- Joined: Tue Apr 22, 2014 2:54 pm
Re: Multivariate multilevel models
Last comment George
While the code is useful and does allow me to consider the correlation at the school level between the 2 outcomes, I am not sure if this approach is general enough for me to model the joint relationship between the level 2 and level 1 outcome. For example, the code estimates the means of the response dummies and their corresponding variances at the student and school levels but if I want to constrain the covariance at level 2 between the 2 responses or adjust for other variables, then I am not quite sure how to do that. Part of the reason why I am lost is that the code you provided is different from the multivariate runmlwin codes with its eq 1 and eq 2 formulation etc.
I was under the impression that runmlwin and its equivalent MLwiN does allow one to model multivariate relationships more flexibly or it could be that I just do not have the right knowledge. As I said, when you jointly model 2 outcomes at the same level of the data hierarchy(say student outcomes in Maths and English), then it is plain sailing but having 2 at different levels of the hierarchy (and both are normal responses), runmlwin returns the message that there is a numerical error calculating it the likelihood.
I do admit poor skills in coding however so if I am missing something very obvious, please forgive my ignorance.
One final question then: is it absolutely necessary that the data be in the long as opposed to the wide format when modelling multivariate relationships?
With thanks
Russell
While the code is useful and does allow me to consider the correlation at the school level between the 2 outcomes, I am not sure if this approach is general enough for me to model the joint relationship between the level 2 and level 1 outcome. For example, the code estimates the means of the response dummies and their corresponding variances at the student and school levels but if I want to constrain the covariance at level 2 between the 2 responses or adjust for other variables, then I am not quite sure how to do that. Part of the reason why I am lost is that the code you provided is different from the multivariate runmlwin codes with its eq 1 and eq 2 formulation etc.
I was under the impression that runmlwin and its equivalent MLwiN does allow one to model multivariate relationships more flexibly or it could be that I just do not have the right knowledge. As I said, when you jointly model 2 outcomes at the same level of the data hierarchy(say student outcomes in Maths and English), then it is plain sailing but having 2 at different levels of the hierarchy (and both are normal responses), runmlwin returns the message that there is a numerical error calculating it the likelihood.
I do admit poor skills in coding however so if I am missing something very obvious, please forgive my ignorance.
One final question then: is it absolutely necessary that the data be in the long as opposed to the wide format when modelling multivariate relationships?
With thanks
Russell
-
- Site Admin
- Posts: 432
- Joined: Fri Apr 01, 2011 2:14 pm
Re: Multivariate multilevel models
Hi Russell,
The approach I described is in many ways more general than the standard approach of specifying a multivariate response model in MLwiN. Indeed, if you specify a multivariate response model in MLwiN in the standard way, it will reshape the data from wide form to long form for you anyway. The way I described while in some senses more fiddly does gives you much more control over what you are doing. It also avoids the problems which leads to the error messages which you describe.
In terms of your specific questions...
Use the constraints() option to constrain the covariance (or any other parameters) to any value you like.
To enter a covariate into a given equation, by first create the interaction between this variable and the relevant response dummy and then enter this interaction into the model.
Best wishes
George
The approach I described is in many ways more general than the standard approach of specifying a multivariate response model in MLwiN. Indeed, if you specify a multivariate response model in MLwiN in the standard way, it will reshape the data from wide form to long form for you anyway. The way I described while in some senses more fiddly does gives you much more control over what you are doing. It also avoids the problems which leads to the error messages which you describe.
In terms of your specific questions...
Use the constraints() option to constrain the covariance (or any other parameters) to any value you like.
To enter a covariate into a given equation, by first create the interaction between this variable and the relevant response dummy and then enter this interaction into the model.
Best wishes
George
-
- Posts: 6
- Joined: Tue Apr 22, 2014 2:54 pm
Re: Multivariate multilevel models
Dear George,
I appreciate you taking the time to think through and literally sort this one for me. You are clearly pushing me to learn (more) code and to take it as seriously as one would a theoretical treatment of any topic.
I am grateful-thank you
Russell
I appreciate you taking the time to think through and literally sort this one for me. You are clearly pushing me to learn (more) code and to take it as seriously as one would a theoretical treatment of any topic.
I am grateful-thank you
Russell