www.cmm.bristol.ac.uk/forum

Posted: **Mon Sep 23, 2013 2:13 pm**

Dear forum,

I am trying to fit a model in Stata but it seems that Stata can’t handle it. I though maybe MLwiN could do it but before getting into that (I have no experience/knowledge about it at all) it seemed smart to ask here if you think that the model might run.
My case is the price of editions of books, written by authors and published by publishers. Both editions and authors are not exclusively nested in publishers so I went for a cross-classified model. That looks like this:
xtmixed price || _all: R.publisher || author:, variance mle

However, the dataset is very large with a lot of small groups. There are:
60435 editions
Written by 16202 authors
Published by 1083 publishers.

Stata (i use stata 12 64-bit) gives two errors: either it says “likelihood evaluates to missing r(430)” or
“J(): 3900 unable to allocate real <tmp>[60435,16204]”. But both I am guessing are because of the large amount of groups.

Is there reason to believe MLwiN could do it?

Posted: **Mon Sep 23, 2013 4:07 pm**

Hi Thomas,

Yes, MLwiN (and runmlwin) can handle problems of this size fairly easily.

You will need to fit the model by MCMC as opposed to IGLS.

You will need to use the following syntax

Code: Select all

* Fit naive 3-level model by IGLS
runmlwin price cons, ///
	level3(publisher: cons) ///
	level2(author: cons) ///
	level1(edition: cons) ///
	nopause
	
estimates store m1igls

* Fit two-way cross-classified model by MCMC
runmlwin price cons, ///
	level3(publisher: cons) ///
	level2(author: cons) ///
	level1(edition: cons) ///
	mcmc(cc) initsmodel(m1igls) ///
	nopause

Best wishes

George

Posted: **Tue Sep 24, 2013 12:15 pm**

Dear George,

Thanks I will let you know if I succeed!

Update.

That was easy! I only had two small things that I wanted to check with you.

First, I didn't have a "cons" variable. I generated a variable with a score of 1 for each case, that is correct right?
Second, I'm a little confused about the amount of levels. When I give stata the command to use three levels for the first runmlwin command it claims that the sorting is not good. And I can't get rid of that. However, when using two levels it works fine. Moreover, in the second command, the cross-classified command, it is able to use three levels even when I only stored the results of a two level IGLS. The question is whether to include the first level of books (which is not a group variable but just a unique number for each case), or only use author and pubisher (like the xtmixed command seems to do).

Posted: **Tue Sep 24, 2013 2:06 pm**

I can't get the output on the forum in another way I think, so here it goes:

Code: Select all

. runmlwin ln_gecorrigeerdeprijs cons, ///
    level2(uitgeverij1: cons) level1(auteur1: cons)
    mcmc(cc) initsmodel(m1igls) nopause
 
MLwiN 2.28 multilevel model                     Number of obs      =     60435
Normal response model
Estimation algorithm: MCMC


No. of       Observations per Group
Level Variable    Groups    Minimum    Average    Maximum

uitgeverij1      1083             1                55.8       4041


Burnin                     =        500
Chain                      =       5000
Thinning                   =          1
Run time (seconds)         =       72.9
Deviance (dbar)            =   56065.86
Deviance (thetabar)        =   55131.62
Effective no. of pars (pd) =     934.24
Bayesian DIC               =   57000.09

ln_gecorrigeerdeprijs              Mean        Std. Dev.     ESS      P       [95% Cred. Interval]

cons                                    2.685176   .0141558       25      0.000     2.659364    2.714897

Random-effects Parameters                   Mean    Std. Dev.     ESS        [95% Cred. Int]

Level 2: uitgeverij1         var(cons)   .2877909  .0148275   2940        .2602427  .3185186

Level 1: auteur1             var(cons)   .1480511   .000872     4462        .146329  .1497789

Posted: **Tue Sep 24, 2013 9:51 pm**

1) cons should indeed by a vector of ones that is the same length of the data.

2) IGLS, which you are using for starting values, assumes (and runmlwin checks) that your data is sorted in the order of the model hierarchy, i.e:

Code: Select all

region  school  student
1       1       1
1       1       2
1       1       3
1       1       4
1       2       5
1       2       6
1       2       7
2       3       8
2       3       9
2       3       10
2       3       11
2       4       12
2       4       13
2       4       14

...

Of course with a cross-classified model it is often impossible to sort the data in a way that satisfies this. You can therefore skip the check by specifying the nosort option in runmlwin, or manually specify your own starting values with the initsb and initsv options. Note that these values are going to be further away then using the results from an IGLS model with the same structure as your desired model so you may have to run MCMC for more iterations before it converges to the correct distribution.

MCMC in MLwiN will use the ID codes to identify the higher level units, so you will want to use author and publisher as your level IDs.

If you use initial values from a simpler previous model then runmlwin will bring across the estimates that you provided for matching parameters and fill in any that aren't included with a default value (as I recall this will be zero).

www.cmm.bristol.ac.uk/forum

cross-classified lots of groups

cross-classified lots of groups

Re: cross-classified lots of groups

Re: cross-classified lots of groups

Re: cross-classified lots of groups

Re: cross-classified lots of groups