Page 1 of 1

Multivariate multilevel models

Posted: Tue Apr 22, 2014 3:05 pm
by RAWILDEMAN29
Dear All,

What a nice site-useful and has a lot of immediate references (code) that one can use.

A few years ago I did a fairly uncomplicated multilevel multivariate model where the 3 outcomes were measured at the same level (individual student level subject scores). Modelling these responses presented no problem. However, I am now trying to model school resources (a level 2 variable) jointly with academic outcome but I am having problems.

Now, I understand that there is no level 1 variation for a level 2 variable and I was trying to use the "elements" option to tell runmlwin to set the level 1 variance and co-variances to zero. In spite of my best efforts, I still get a message that states the log likelihood could not be calculated and there is something wrong with my model. I am sure I am doing something wrong but can't figure what to do about it.

Has anyone done this and has any help or further suggestions for me?

Many thanks,
Russell Wildeman

Re: Multivariate multilevel models

Posted: Tue Apr 22, 2014 3:53 pm
by GeorgeLeckie
Great question,

Yes I think you can do this (at least when both response variables are continuous responses).

Below is some example code

Best wishes

George

Commands...

Code: Select all

* Clear data and set seed
clear
set seed 131235

* Set number of schools to 100
set obs 100
generate school = _n

* Generate school-level random intercept-effects for the two response variables
matrix C = (1,0.5 \ 0.5, 2)
drawnorm u1 u2, cov(C)

* Set number of students per school to 10
expand 10
sort school
generate student = _n

* Generate the pupil-level residuals for the pupil-level response varible
generate e1 = rnormal()

* Generate the pupil-level response variable
generate y1 = 5 + u1 + e1

* Generate the school-level response variable
generate y2 = 0 + u2

* Examine data for the first school
list if school==1, sepby(school)

* Reshape the data from wide to long form so that both responses in one variable
bysort school (student): replace y2 = . if _n!=1
reshape long y u, i(school student) j(response)
drop if y==.
order school response student
sort school response student

* Set student ID to 0 for school-level response
replace student = 0 if response==2

* Examine data for the first school
list if school==1, sepby(school)

* Generate dummy variables to indicate the two different response equations
tabulate response, gen(r)

* Fit model to pupil-level response variable
runmlwin y r1 if r1==1, level2(school: r1) level1(student: r1) nopause

* Fit model to school-level response variable
runmlwin y r2 if r2==1, level1(school: r2) nopause

* Fit model to both responses allowing for a school-level correlation
runmlwin y r1 r2, level2(school: r1 r2) level1(student: r1 r2) nopause
Output...

Code: Select all

. * Clear data and set seed
. clear

. set seed 131235

. 
. * Set number of schools to 100
. set obs 100
obs was 0, now 100

. generate school = _n

. 
. * Generate school-level random intercept-effects for the two response variables
. matrix C = (1,0.5 \ 0.5, 2)

. drawnorm u1 u2, cov(C)

. 
. * Set number of students per school to 10
. expand 10
(900 observations created)

. sort school

. generate student = _n

. 
. * Generate the pupil-level residuals for the pupil-level response varible
. generate e1 = rnormal()

. 
. * Generate the pupil-level response variable
. generate y1 = 5 + u1 + e1

. 
. * Generate the school-level response variable
. generate y2 = 0 + u2

. 
. * Examine data for the first school
. list if school==1, sepby(school)

      +-----------------------------------------------------------------------------+
      | school          u1          u2   student          e1         y1          y2 |
      |-----------------------------------------------------------------------------|
   1. |      1   -.8199924   -1.223277         1   -1.226586   2.953421   -1.223277 |
   2. |      1   -.8199924   -1.223277         2     .039657   4.219665   -1.223277 |
   3. |      1   -.8199924   -1.223277         3   -.2504696   3.929538   -1.223277 |
   4. |      1   -.8199924   -1.223277         4   -.5030173    3.67699   -1.223277 |
   5. |      1   -.8199924   -1.223277         5    1.739387   5.919394   -1.223277 |
   6. |      1   -.8199924   -1.223277         6   -.0061893   4.173818   -1.223277 |
   7. |      1   -.8199924   -1.223277         7   -.2211102   3.958897   -1.223277 |
   8. |      1   -.8199924   -1.223277         8    2.776873   6.956881   -1.223277 |
   9. |      1   -.8199924   -1.223277         9    1.336618   5.516625   -1.223277 |
  10. |      1   -.8199924   -1.223277        10    3.091241   7.271249   -1.223277 |
      +-----------------------------------------------------------------------------+

. 
. * Reshape the data from wide to long form so that both responses in one variable
. bysort school (student): replace y2 = . if _n!=1
(900 real changes made, 900 to missing)

. reshape long y u, i(school student) j(response)
(note: j = 1 2)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                     1000   ->    2000
Number of variables                   7   ->       6
j variable (2 values)                     ->   response
xij variables:
                                  y1 y2   ->   y
                                  u1 u2   ->   u
-----------------------------------------------------------------------------

. drop if y==.
(900 observations deleted)

. order school response student

. sort school response student

. 
. * Set student ID to 0 for school-level response
. replace student = 0 if response==2
(100 real changes made)

. 
. * Examine data for the first school
. list if school==1, sepby(school)

      +-----------------------------------------------------------------+
      | school   response   student           u          e1           y |
      |-----------------------------------------------------------------|
   1. |      1          1         1   -.8199924   -1.226586    2.953421 |
   2. |      1          1         2   -.8199924     .039657    4.219665 |
   3. |      1          1         3   -.8199924   -.2504696    3.929538 |
   4. |      1          1         4   -.8199924   -.5030173     3.67699 |
   5. |      1          1         5   -.8199924    1.739387    5.919394 |
   6. |      1          1         6   -.8199924   -.0061893    4.173818 |
   7. |      1          1         7   -.8199924   -.2211102    3.958897 |
   8. |      1          1         8   -.8199924    2.776873    6.956881 |
   9. |      1          1         9   -.8199924    1.336618    5.516625 |
  10. |      1          1        10   -.8199924    3.091241    7.271249 |
  11. |      1          2         0   -1.223277   -1.226586   -1.223277 |
      +-----------------------------------------------------------------+

. 
. * Generate dummy variables to indicate the two different response equations
. tabulate response, gen(r)

   response |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      1,000       90.91       90.91
          2 |        100        9.09      100.00
------------+-----------------------------------
      Total |      1,100      100.00

. 
. * Fit model to pupil-level response variable
. runmlwin y r1 if r1==1, level2(school: r1) level1(student: r1) nopause
 
MLwiN 2.30 multilevel model                     Number of obs      =      1000
Normal response model
Estimation algorithm: IGLS

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         school |      100         10       10.0         10
-----------------------------------------------------------

Run time (seconds)   =       1.29
Number of iterations =          2
Log likelihood       = -1528.6796
Deviance             =  3057.3591
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          r1 |   4.924213   .1052099    46.80   0.000     4.718005    5.130421
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school              |
                     var(r1) |   1.009213   .1566091      .7022644    1.316161
-----------------------------+------------------------------------------------
Level 1: student             |
                     var(r1) |   .9770083   .0460566       .886739    1.067278
------------------------------------------------------------------------------

. 
. * Fit model to school-level response variable
. runmlwin y r2 if r2==1, level1(school: r2) nopause
 
MLwiN 2.30 multilevel model                     Number of obs      =       100
Normal response model
Estimation algorithm: IGLS

Run time (seconds)   =       1.07
Number of iterations =          2
Log likelihood       = -183.48674
Deviance             =  366.97348
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          r2 |  -.4249741   .1515778    -2.80   0.005    -.7220612   -.1278871
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 1: school              |
                     var(r2) |   2.297583   .3249273      1.660737    2.934429
------------------------------------------------------------------------------

. 
. * Fit model to both responses allowing for a school-level correlation
. runmlwin y r1 r2, level2(school: r1 r2) level1(student: r1 r2) nopause
 
MLwiN 2.30 multilevel model                     Number of obs      =      1100
Normal response model
Estimation algorithm: IGLS

-----------------------------------------------------------
                |   No. of       Observations per Group
 Level Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         school |      100         11       11.0         11
-----------------------------------------------------------

Run time (seconds)   =       1.32
Number of iterations =          2
Log likelihood       = -1704.0128
Deviance             =  3408.0256
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          r1 |   4.924213   .1052099    46.80   0.000     4.718006    5.130421
          r2 |  -.4249741   .1515778    -2.80   0.005    -.7220611    -.127887
------------------------------------------------------------------------------

------------------------------------------------------------------------------
   Random-effects Parameters |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Level 2: school              |
                     var(r1) |   1.009213   .1566089      .7022652    1.316161
                  cov(r1,r2) |   .6186087   .1710527      .2833517    .9538658
                     var(r2) |   2.297583   .3249273      1.660738    2.934429
-----------------------------+------------------------------------------------
Level 1: student             |
                     var(r1) |   .9770084   .0460566      .8867391    1.067278
                  cov(r1,r2) |          0          0             0           0
                     var(r2) |          0          0             0           0
------------------------------------------------------------------------------

Re: Multivariate multilevel models

Posted: Tue Apr 22, 2014 4:24 pm
by RAWILDEMAN29
Dear George,

This is immensely useful-thank you very much! I will be running this much later tonight.
Gosh, I tried today but I am so glad for this.

With gratitude,
Russell

Re: Multivariate multilevel models

Posted: Wed Apr 23, 2014 10:06 am
by RAWILDEMAN29
Last comment George

While the code is useful and does allow me to consider the correlation at the school level between the 2 outcomes, I am not sure if this approach is general enough for me to model the joint relationship between the level 2 and level 1 outcome. For example, the code estimates the means of the response dummies and their corresponding variances at the student and school levels but if I want to constrain the covariance at level 2 between the 2 responses or adjust for other variables, then I am not quite sure how to do that. Part of the reason why I am lost is that the code you provided is different from the multivariate runmlwin codes with its eq 1 and eq 2 formulation etc.

I was under the impression that runmlwin and its equivalent MLwiN does allow one to model multivariate relationships more flexibly or it could be that I just do not have the right knowledge. As I said, when you jointly model 2 outcomes at the same level of the data hierarchy(say student outcomes in Maths and English), then it is plain sailing but having 2 at different levels of the hierarchy (and both are normal responses), runmlwin returns the message that there is a numerical error calculating it the likelihood.

I do admit poor skills in coding however so if I am missing something very obvious, please forgive my ignorance.

One final question then: is it absolutely necessary that the data be in the long as opposed to the wide format when modelling multivariate relationships?

With thanks
Russell

Re: Multivariate multilevel models

Posted: Wed Apr 23, 2014 6:00 pm
by GeorgeLeckie
Hi Russell,

The approach I described is in many ways more general than the standard approach of specifying a multivariate response model in MLwiN. Indeed, if you specify a multivariate response model in MLwiN in the standard way, it will reshape the data from wide form to long form for you anyway. The way I described while in some senses more fiddly does gives you much more control over what you are doing. It also avoids the problems which leads to the error messages which you describe.

In terms of your specific questions...

Use the constraints() option to constrain the covariance (or any other parameters) to any value you like.

To enter a covariate into a given equation, by first create the interaction between this variable and the relevant response dummy and then enter this interaction into the model.

Best wishes

George

Re: Multivariate multilevel models

Posted: Thu Apr 24, 2014 12:31 pm
by RAWILDEMAN29
Dear George,

I appreciate you taking the time to think through and literally sort this one for me. You are clearly pushing me to learn (more) code and to take it as seriously as one would a theoretical treatment of any topic.

I am grateful-thank you
Russell