Page 1 of 1
cross-classified model with sparse data
Posted: Thu Mar 22, 2012 8:59 am
by AnjaScheiwe
Dear all,
I am new to MLwiN and have been trying to run a cross-classified model with 3 levels: children, schools and neighbourhoods. I have data from all UK countries, but the data is sparse and in Stata, the model would only run for one of them (Wales) where the number of observations per group is highest: here sample size is 1714, number of schools is 522 (average number of obs. per group = 3.3) and number of neighbourhoods is 268 (average obs = 6.4). The model seems to run fine via runmlwin but the estimates are very different from what I got with Stata. For the other countries, the models do not run at all in Stata, and in MLwiN I get unrealistic results for the higher level variances, i.e. they are too small.
I have three questions:
1. Is there a rule of thumb as to the minimum of observations per group necessary?
2. For a cross-classified model, does the order in which the higher levels are specified matter?
3. Can I trust that the estimates given by MLwiN for Wales are correct?
This is the code:
The outcome is reading, msoa4all is the neighbourhood identifier and s4schoolid the school identifier. The starting values I have used are close to what I got from the Stata model.
Stata:
Code: Select all
xi: xtmixed s4read || _all: R.msoa4all || s4schoolid: ,mle var
MLwiN:
Code: Select all
sort msoa4all s4schoolid mcsid
matrix b= (100, 20, 20, 300)
runmlwin s4read cons , level3 (msoa4all:cons) level2(s4schoolid: cons) level1(mcsid: cons) mcmc (cc) initsb (b)
Many thanks for your help!
Anja
Re: cross-classified model with sparse data
Posted: Thu Mar 22, 2012 12:22 pm
by GeorgeLeckie
Hi Anja,
Thanks for your post. In terms of your queries...
1. Is there a rule of thumb as to the minimum of observations per group necessary?
No, but clearly the more the better
2. For a cross-classified model, does the order in which the higher levels are specified matter?
No
3. Can I trust that the estimates given by MLwiN for Wales are correct?
The large discrepancy you describe sounds concerning and you need to get to the botom of why this is
Your syntax for both xtmixed and runmlwin look correct
Please will you paste the output from your xtmixed and runmlwin models?
Many thanks
George
Re: cross-classified model with sparse data
Posted: Thu Mar 22, 2012 12:56 pm
by AnjaScheiwe
Dear George,
many thanks for the quick reply!
This is the runmlwin output:
. runmlwin s4read cons , level3 (msoa4all:cons) level2(s4schoolid: cons) level1(mcsid: cons) mcmc (cc) initsb (
> b)
Code: Select all
MLwiN 2.24 multilevel model Number of obs = 1714
Normal response model
Estimation algorithm: MCMC
No. of Observations per Group
Level Variable Groups Minimum Average Maximum
msoa4all 268 1 6.4 37
s4schoolid 522 1 3.3 20
Burnin = 500
Chain = 5000
Thinning = 1
Run time (seconds) = 12.8
Deviance (dbar) = 14777.13
Deviance (thetabar) = 14676.10
Effective no. of pars (pd) = 101.03
Bayesian DIC = 14878.15
s4read Mean Std. Dev. z ESS [95% Cred. Interval]
cons 105.9433 .6327433 167.43 1624 104.6789 107.1946
Random-effects Parameters Mean Std. Dev. ESS [95% Cred. Int]
Level 3: msoa4all
var(cons) 27.44869 12.79693 13 .047128 48.76515
Level 2: s4schoolid
var(cons) 6.791408 10.91118 4 .0009977 33.64681
Level 1: mcsid
var(cons) 325.5512 17.60093 12 295.032 367.4328
And the Stata output:
Code: Select all
Mixed-effects ML regression Number of obs = 1714
No. of Observations per Group
Group Variable Groups Minimum Average Maximum
_all 1 1714 1714.0 1714
s4schoolid 522 1 3.3 20
Wald chi2(0) = .
Log likelihood = -7441.2822 Prob > chi2 = .
s4read Coef. Std. Err. z P>z [95% Conf. Interval]
_cons 105.8989 .6426575 164.78 0.000 104.6393 107.1585
Random-effects Parameters Estimate Std. Err. [95% Conf. Interval]
_all: Identity
var(R.msoa4all) 22.83488 8.31861 11.18179 46.63225
s4schoolid: Identity
var(_cons) 24.55262 9.82302 11.20854 53.78323
var(Residual) 310.1152 12.25352 287.0052 335.0861
LR test vs. linear regression: chi2(2) = 64.31 Prob > chi2 = 0.0000
I do get a different result from runmlwin when I specify schools to be the third level though:
Code: Select all
. sort s4schoolid msoa4all mcsid
. runmlwin s4read cons , level3 (s4schoolid:cons) level2(msoa4all: cons) level1(mcsid: cons) mcmc (cc) initsb (
> b) nopause
MLwiN 2.24 multilevel model Number of obs = 1714
Normal response model
Estimation algorithm: MCMC
No. of Observations per Group
Level Variable Groups Minimum Average Maximum
s4schoolid 522 1 3.3 20
msoa4all 268 1 6.4 37
Burnin = 500
Chain = 5000
Thinning = 1
Run time (seconds) = 5.04
Deviance (dbar) = 14788.13
Deviance (thetabar) = 14705.59
Effective no. of pars (pd) = 82.54
Bayesian DIC = 14870.67
s4read Mean Std. Dev. z ESS [95% Cred. Interval]
cons 106.1357 .6393731 166.00 1540 104.8724 107.3723
Random-effects Parameters Mean Std. Dev. ESS [95% Cred. Int]
Level 3: s4schoolid
var(cons) .014607 .0197304 16 .0010188 .0818259
Level 2: msoa4all
var(cons) 32.93592 10.3849 56 .8175938 52.04788
Level 1: mcsid
var(cons) 327.1889 13.14256 164 303.9248 355.6789
Many thanks!
Anja
Re: cross-classified model with sparse data
Posted: Thu Mar 22, 2012 3:49 pm
by GeorgeLeckie
Hi Anja,
We have a few problems with MCMC cross-classified models in recent versions of runmlwin and MLwiN
I see that you are not using the latest version of MLwiN. You are using MLwiN 2.24 and the latest version is 2.25.
So please first make sure that you are using the latest version of runmlwin by issuing the following command from within Stata
. ssc install runmlwin, replace
Please then make sure that you are using MLwiN 2.25 by downloading it from
http://www.bristol.ac.uk/cmm/software/m ... rades.html
Then re-run your models and post the xtmixed command and output and new runmlwinc command and output so that we can take a look
Best wishes
George
Re: cross-classified model with sparse data
Posted: Thu Mar 22, 2012 11:14 pm
by AnjaScheiwe
Dear George,
You were right - I did the update and now everything seems to be working fine. Apologies for not thinking of this myself! I can now run models for the whole of the UK with sensible results. The estimates for Wales are similar to what I get from xtmixed. For the single-country analysis the order of the higher levels still makes a slight difference, however when I use the whole dataset it doesn't.
I'm happy to post the output if you still want to see it.
Many thanks for your helpful advice,
Anja
Re: cross-classified model with sparse data
Posted: Fri Mar 23, 2012 2:40 pm
by GeorgeLeckie
Hi Anja,
You say "the order of the higher levels still makes a slight difference"
MLwiN will simulate starting values for your higher level residuals (they are viewed a parameters when fitting the model by MCMC)
Flipping the levels around will result in different random starting values for the two sets of higher level units and this will lead to small differences in your results
I would have thought that the magnitude of any discrepancies you find between the two ways of specifying the model would become trivially small as you increase the burnin and chain.
Best wishes
George
Re: cross-classified model with sparse data
Posted: Tue Mar 27, 2012 8:30 am
by AnjaScheiwe
Dear George,
I did what you suggested (increase burnin and chain) and yes now the estimates are very similar.
Many thanks for your help,
Anja
Re: cross-classified model with sparse data
Posted: Tue Mar 27, 2012 12:48 pm
by GeorgeLeckie
That's great, I'm glad that we have cleared up the discrpancies
Best wishes
George