Page 1 of 1

large dataset & worksheet size limitation

Posted: Thu Feb 23, 2012 2:25 pm
by julia1633
Greetings,

I am currently working with a large dataset of about 800,000 obs and 330 group obs at level 2 and 80 group obs at level 3. While I managed to run a multinomial regression within the multilevel setting using mql1, I couldn't estimate it based on this with mcmc. MLwin keeps crashing generating a message that a programme has stopped working. After adding an interaction term I fail even to estimate the model using mql1 and I keep getting the same message that 56637648 free space is needed. I've simplified the model to a 2-level one with the second level reflecting a country-year groupings. However, I still face the same problem. Is there anyhow I can increase a worksheet above 250000 (I tried it but Mlwin does not allow me to do this)?

Thanks,
Julia

Re: large dataset & worksheet size limitation

Posted: Thu Feb 23, 2012 3:16 pm
by GeorgeLeckie
Hi Julia,

In terms of your first query "MLwin keeps crashing generating a message that a programme has stopped working" you will need to post that one on the MLwiN forum as it sounds like an MLwiN rather than a runmlwin problem. You can confirm this by checking that you get the same error message when running the model in MLwiN via the traditional point-and-click route.

In terms of your second query "I keep getting the same message that 56637648 free space is needed", you could try manually inreasing the amount of RAM runmlwin allocates to MLwiN (runmlwin occasionally assignes too little RAM to MLwiN)

We only implemented these options a couple of months ago, so the first thing you want to do is make sure you have the latest version of runmlwin

Code: Select all

ssc install runmlwin, replace
You can now use the new options we added, documented at

Code: Select all

help runmlwin##mlwin_settings
Try ramping up the RAM using the size() option to see whether this solves your problem.

You can see what RAM runmlwin automatically assigns to MLwiN by first attempting (and failing) to fit this model without this option. From within MLwiN you can check the settings which runmlwin has assinged by going to

Options --> Worksheet

Best wishes

George

Re: large dataset & worksheet size limitation

Posted: Thu Feb 23, 2012 3:37 pm
by julia1633
Dear George,

Many thanks for this. I'll follow your suggestions and see if this would make a difference.

Thanks,
Julia

Re: large dataset & worksheet size limitation

Posted: Sun Feb 26, 2012 10:29 pm
by julia1633
Dear George,

Thanks a lot. Increase of RAM allocated by runmlwin to MlWin manually seems to have done the trick. The model runs. However, I seem to get quite divergent results for some of the variables of interest using mql1 and MCMC. The model is fitted with MCMC based on the initial values obtained from MQl1. More specifically, in the case of MQL1 I got stronger property rights protection (xconst) insiginificant in explaining the choice of self-employment vis-a-vis non entry into entrepreneurship (contrast 1) and entrepreneurship with growth aspirations vis-a-vis non-entry into entrepreneurship ~(contrast 2). However, once the model is fitted with MCMC the results seem to be very odd: I got stronger property rights protection to be positively and significantly affecting the choice of self-employment and negatively and significantly affecting entrepreneurship with growth aspirations. This is really odd. I'd rather expect stronger property rights protection to be insignificantly related to self-employment but be positive and significant in the case of entrepreneurship with growth aspirations. I know that MQL1 does not produce robust results and that the model shoudl be finalised with MCMC. However, I wonder whether such a divergence between the results obtained based on MQL1 and MCMC should be expected or there is something wrong and none of these two estimators should be trusted.

best wishes,
Julia

Re: large dataset & worksheet size limitation

Posted: Sun Feb 26, 2012 10:32 pm
by julia1633
attached is the file with the results just in case if you wish to look at them...

Re: large dataset & worksheet size limitation

Posted: Sun Feb 26, 2012 10:46 pm
by GeorgeLeckie
Hi Julia,

I'm afraid we don't have the scope to offer more substantive support with fitting these models.

However, one thing which you should do is run the model for longer as your Effective Sample Sizes (ESS) for the parameters which you describe (the coefficients of xconst) are only 5, which is very low. You wouldn't report summary statistics based on a sample of 5 observations, so summary statistics based on a chain which is only equivalent to 5 independent observations isn't great either!

You might want try specifying the orthogonal and hierarchical centring options as well. Read the relevant chapters of the MCMC manual first. Also read the bits about ESS in the manual.

Best wishes

George