Page 1 of 1
cross-classified lots of groups
Posted: Mon Sep 23, 2013 2:13 pm
by thomasPFF
Dear forum,
I am trying to fit a model in Stata but it seems that Stata can’t handle it. I though maybe MLwiN could do it but before getting into that (I have no experience/knowledge about it at all) it seemed smart to ask here if you think that the model might run.
My case is the price of editions of books, written by authors and published by publishers. Both editions and authors are not exclusively nested in publishers so I went for a cross-classified model. That looks like this:
xtmixed price || _all: R.publisher || author:, variance mle
However, the dataset is very large with a lot of small groups. There are:
60435 editions
Written by 16202 authors
Published by 1083 publishers.
Stata (i use stata 12 64-bit) gives two errors: either it says “likelihood evaluates to missing r(430)” or
“J(): 3900 unable to allocate real <tmp>[60435,16204]”. But both I am guessing are because of the large amount of groups.
Is there reason to believe MLwiN could do it?
Re: cross-classified lots of groups
Posted: Mon Sep 23, 2013 4:07 pm
by GeorgeLeckie
Hi Thomas,
Yes, MLwiN (and runmlwin) can handle problems of this size fairly easily.
You will need to fit the model by MCMC as opposed to IGLS.
You will need to use the following syntax
Code: Select all
* Fit naive 3-level model by IGLS
runmlwin price cons, ///
level3(publisher: cons) ///
level2(author: cons) ///
level1(edition: cons) ///
nopause
estimates store m1igls
* Fit two-way cross-classified model by MCMC
runmlwin price cons, ///
level3(publisher: cons) ///
level2(author: cons) ///
level1(edition: cons) ///
mcmc(cc) initsmodel(m1igls) ///
nopause
Best wishes
George
Re: cross-classified lots of groups
Posted: Tue Sep 24, 2013 12:15 pm
by thomasPFF
Dear George,
Thanks I will let you know if I succeed!
Update.
That was easy! I only had two small things that I wanted to check with you.
First, I didn't have a "cons" variable. I generated a variable with a score of 1 for each case, that is correct right?
Second, I'm a little confused about the amount of levels. When I give stata the command to use three levels for the first runmlwin command it claims that the sorting is not good. And I can't get rid of that. However, when using two levels it works fine. Moreover, in the second command, the cross-classified command, it is able to use three levels even when I only stored the results of a two level IGLS. The question is whether to include the first level of books (which is not a group variable but just a unique number for each case), or only use author and pubisher (like the xtmixed command seems to do).
Re: cross-classified lots of groups
Posted: Tue Sep 24, 2013 2:06 pm
by thomasPFF
I can't get the output on the forum in another way I think, so here it goes:
Code: Select all
. runmlwin ln_gecorrigeerdeprijs cons, ///
level2(uitgeverij1: cons) level1(auteur1: cons)
mcmc(cc) initsmodel(m1igls) nopause
MLwiN 2.28 multilevel model Number of obs = 60435
Normal response model
Estimation algorithm: MCMC
No. of Observations per Group
Level Variable Groups Minimum Average Maximum
uitgeverij1 1083 1 55.8 4041
Burnin = 500
Chain = 5000
Thinning = 1
Run time (seconds) = 72.9
Deviance (dbar) = 56065.86
Deviance (thetabar) = 55131.62
Effective no. of pars (pd) = 934.24
Bayesian DIC = 57000.09
ln_gecorrigeerdeprijs Mean Std. Dev. ESS P [95% Cred. Interval]
cons 2.685176 .0141558 25 0.000 2.659364 2.714897
Random-effects Parameters Mean Std. Dev. ESS [95% Cred. Int]
Level 2: uitgeverij1 var(cons) .2877909 .0148275 2940 .2602427 .3185186
Level 1: auteur1 var(cons) .1480511 .000872 4462 .146329 .1497789
Re: cross-classified lots of groups
Posted: Tue Sep 24, 2013 9:51 pm
by ChrisCharlton
1) cons should indeed by a vector of ones that is the same length of the data.
2) IGLS, which you are using for starting values, assumes (and runmlwin checks) that your data is sorted in the order of the model hierarchy, i.e:
Code: Select all
region school student
1 1 1
1 1 2
1 1 3
1 1 4
1 2 5
1 2 6
1 2 7
2 3 8
2 3 9
2 3 10
2 3 11
2 4 12
2 4 13
2 4 14
...
Of course with a cross-classified model it is often impossible to sort the data in a way that satisfies this. You can therefore skip the check by specifying the
nosort option in runmlwin, or manually specify your own starting values with the
initsb and
initsv options. Note that these values are going to be further away then using the results from an IGLS model with the same structure as your desired model so you may have to run MCMC for more iterations before it converges to the correct distribution.
MCMC in MLwiN will use the ID codes to identify the higher level units, so you will want to use author and publisher as your level IDs.
If you use initial values from a simpler previous model then runmlwin will bring across the estimates that you provided for matching parameters and fill in any that aren't included with a default value (as I recall this will be zero).