Page 1 of 2

MCMC is not taking the same starting values but for different datasets

Posted: Wed Oct 14, 2015 4:07 pm
by adeldaoud
** I started a new thread instead **

I am following up on this thread, but the question is merely related.

I am using starting values and passing both FP.b and RP.b to the model as you suggested. R2mlwing manages to run this model if I use a small random sample of the original data (~ 1000 cases), but not when I want to estimate the model of the full sample (~ 1.9 million cases).


I am getting this error code:

> m1test2 <- runMLwiN(logit(AbsolutDep, cons) ~ 1 + (1 | country) + (1 |CountryClusterHouse), D = "Binomial", estoptions = list(EstM = 1, resi.store=F,
+ debugmode=F, optimat=T,
+ mcmcMeth=list(iterations=10, burnin=10),
+ mcmcOptions=list(hcen=3),
+ startval=list(FP.b = PLOSONEestimations20000IGLS[[1]]@FP , RP.b = PLOSONEestimations20000IGLS[[1]]@RP)), data = test4a, workdir = tempdir(), MLwiNPath="C:/Program Files (x86)/MLwiN v2.35/")


MLwiN is running, please wait......
/nogui option ignored
ECHO 0


Echoing is ON
STAR
iteration 0

Convergence not achieved
JOIN -0.491145551204681 '_FP_b'
JOIN 0 '_FP_v'
JOIN 1.10714721679688 1.04919278621674 1 '_RP_b'
JOIN 0 0 0 0 0 0 '_RP_v'
ECHO 0

Echoing is ON
MCMC 0 10 1 5.8 50 10 G30[1] G30[2] 2 2 2 1 1 2

error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d002647c62.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2

wrong length random constraint matrix


error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d002647c62.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2


wrong length random constraint matrix
.
Execution completed

Error in read.dta(MCMCfile) :
unable to open file: 'No such file or directory'

process.png
process.png (40.66 KiB) Viewed 8024 times
Do you have any idea what is going on? I have tried to pass only the relevant variables, in case we are having a ram problem (my PC has 64 GB of ram), but that does not work either. I am clueless about what to try next.

I am hoping you might have any ideas about what to do next.


Many thanks in advance


PS I am not sure if it is related. But I am also observing a wobbling RAM consumption behaviour when I run this model (see picture please). I have not seen anything like it before. Can these two events be related?




AN UPDATE:

1. I ran a new IGLS with the a dataset to obtain a new set of starting values. I then used those to initiate an MCMC model. But I am still getting the same error. Namely:

MLwiN is running, please wait......
/nogui option ignored
ECHO 0


Echoing is ON
STAR
iteration 0

Convergence not achieved
JOIN -0.49 '_FP_b'
JOIN 0 '_FP_v'
JOIN 1.11 1.05 1 '_RP_b'
JOIN 0 0 0 0 0 0 '_RP_v'
ECHO 0

Echoing is ON
MCMC 0 10 1 5.8 50 10 G30[1] G30[2] 2 2 2 1 1 2

error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d00c492552.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2

wrong length random constraint matrix


error while obeying batch file C:/Users/adel/AppData/Local/Temp/Rtmp4GO7Gq/macrofile_1d00c492552.txt at line number 139:
MCMC 0 10 1 5.8 50 10 C2498] C2499] 2 2 2 1 1 2


wrong length random constraint matrix
.
Execution completed

Error in read.dta(MCMCfile) :
unable to open file: 'No such file or directory'


2. I rounded the starting values before passing them to runMlwin – incase that was the issue. But the problem persists.

Re: MCMC is not taking the same starting values but for different datasets

Posted: Thu Oct 15, 2015 2:19 pm
by ChrisCharlton
Could you try turning on debugmode before running the model, and then once you get to the error open a command and output window (Data Manipulation->Command Interface, and click "output"? Once these have opened issue the following command:

Code: Select all

SETT
which should give you output similar to:

Code: Select all

->SETT
EXPLanatory variables in       bcons.1  cons     age      
FPARameters                             cons     age      
RESPonse variable in           use      
FSDErrors : uncorrected                 RSDErrors : uncorrected
MAXIterations  20   TOLErance     2     METHod is IGLS    BATCh is OFF
RCONstraints in c1494                   
IDENtifying codes : 1-woman, 2-district

LEVEL 2 RPM
         cons     cons     1        
LEVEL 1 RPM(RESETTING OFF)
         bcons.1  bcons.1  1        
After doing so could you look in the column referred to by RCONstraints in ... and let me know what the contents look like?

Re: MCMC is not taking the same starting values but for different datasets

Posted: Thu Oct 15, 2015 6:06 pm
by adeldaoud
Chris,

Thanks for the support. Here come some screenshots:

1. Some curious thing here. Why is the method set to IGLS when I requested a MCMC model? Also, the model outputs “convergence not achieved” before I clicked “Resume Macro”: this text was there when I opened up the Output window.
Skärmklipp 2015-10-15 19.48.42.png
Skärmklipp 2015-10-15 19.48.42.png (103.2 KiB) Viewed 8014 times
2. C2499 looks ok, but _Stats has missing values.
Skärmklipp 2015-10-15 19.53.15.png
Skärmklipp 2015-10-15 19.53.15.png (91.65 KiB) Viewed 8014 times
3. Not sure why c2498 and c2499 are of different lengths (567411 vs 56745). I guess they are referring to my second (household) level? I am running a three level model: kids, in households, in countries.
Skärmklipp 2015-10-15 19.56.25.png
Skärmklipp 2015-10-15 19.56.25.png (96.79 KiB) Viewed 8014 times
Please, let me know if you need more information

Re: MCMC is not taking the same starting values but for different datasets

Posted: Thu Oct 15, 2015 7:32 pm
by adeldaoud
I re-ran with the data which is supposed to work in debugmode out of curiosity. It seems to fail in debugmode but not in non-debugmode.

This is the model output in non-debugmode:

Dbar D(thetabar) pD DIC
1289.819 1236.107 53.713 1343.532
---------------------------------------------------------------------------------------------------
The model formula:
logit(AbsolutDep, cons) ~ 1 + (1 | country) + (1 | CountryClusterHouse)
Level 3: country Level 2: CountryClusterHouse Level 1: l1id
---------------------------------------------------------------------------------------------------
The fixed part estimates:
Coef. Std. Err. z Pr(>|z|) [95% Cred. Interval] ESS
Intercept -0.34126 0.01404 -23.05 1.354e-117 *** -0.36532 -0.32483 10
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
---------------------------------------------------------------------------------------------------
The random part estimates at the country level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_Intercept 0.16383 0.00799 0.15207 0.17628 10
---------------------------------------------------------------------------------------------------
The random part estimates at the CountryClusterHouse level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_Intercept 0.22550 0.01436 0.20378 0.24898 3
---------------------------------------------------------------------------------------------------
The random part estimates at the l1id level:
Coef. Std. Err. [95% Cred. Interval] ESS
var_bcons_1 1.00000 0.00000 0.99845 1.00000 10
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

Please, see attached document for the relevant screenshots.
This happens in debugmode.docx
(166.89 KiB) Downloaded 562 times

Re: MCMC is not taking the same starting values but for different datasets

Posted: Thu Oct 15, 2015 9:39 pm
by ChrisCharlton
This looks like a different error to that you reported before. Could you check that the 'mcmchains' column contains the same number of rows as the 'parnum' and 'itnum' columns? Could you also run the command:

Code: Select all

PRINT b1000
from within MLwiN when you get the error and let me know what output it gives?

Re: MCMC is not taking the same starting values but for different datasets

Posted: Thu Oct 15, 2015 11:23 pm
by adeldaoud
This looks like a different error to that you reported before. Could you check that the 'mcmchains' column contains the same number of rows as the 'parnum' and 'itnum' columns?

I assume that you are referring to the second. Mcmchains has 440 columns whereas parnum and itnum has only 40 each.

Could you also run the command:

This is the output:
->PRINT b1000


B1000
4.0000


Could it be that I am only initiating 10 burn-in and 10 iterations respectively that causes this second error? Just thinking.

Re: MCMC is not taking the same starting values but for different datasets

Posted: Fri Oct 16, 2015 8:54 am
by ChrisCharlton
Thanks for looking at this. The mcmcchains column should only be 40 rows as well (as it's the stacked chains of 10 iterations for 4 parameters). My guess would be that this difference is due to the refresh MCMC option, which is set to 50 iterations by default. This is only used when debugmode is turned on, and updates the interface every refresh iterations. As in your case this number of iterations is higher than the total number requested it is likely that the calculation of how to split them up is going wrong and it is performing more iterations than expected. Could you try setting refresh to ten and seeing whether you still get the same behaviour?

Re: MCMC is not taking the same starting values but for different datasets

Posted: Fri Oct 16, 2015 4:15 pm
by adeldaoud
Thanks Chris. I will check the refresh option asap for the second issue (the one where runmlwin manages to estimate in non-debugmode but not in debugmode).

Do you have any input on the first issue, which is the more pressing one? I would be happy to share the data with you if that would make troubleshooting easier for us? Please, let me know and I can email a Dropbox link.

Cheers
Adel

Re: MCMC is not taking the same starting values but for different datasets

Posted: Fri Oct 16, 2015 4:24 pm
by ChrisCharlton
Regarding the first problem - it looks as if the column in question (c2499) has been allocated for both the starting residuals (as it appears in the MCMC 0 command) and as the IGLS random constraint column. This has resulted in the two sets of values being appended (the first 4 values are the constraints, and the following rows are the residuals). I remember fixing an issue similar to this recently, are you using the development version of R2MLwiN? You might see the method set to IGLS initially when you request an MCMC model as the model is set up and run for some iterations with IGLS, and the starting residuals are generated using the IGLS model where appropriate.

Re: MCMC is not taking the same starting values but for different datasets

Posted: Fri Oct 16, 2015 6:53 pm
by adeldaoud
Thanks for looking at this. The mcmcchains column should only be 40 rows as well (as it's the stacked chains of 10 iterations for 4 parameters). My guess would be that this difference is due to the refresh MCMC option, which is set to 50 iterations by default. This is only used when debugmode is turned on, and updates the interface every refresh iterations. As in your case this number of iterations is higher than the total number requested it is likely that the calculation of how to split them up is going wrong and it is performing more iterations than expected. Could you try setting refresh to ten and seeing whether you still get the same behaviour?
1. I changed the number of iterations and burn-in to 50 and the problem disappeared.

2. Change the refresh option to 10 also works. Like this:

(logit(AbsolutDep, cons) ~ 1 + (1 | country) + (1 |CountryClusterHouse), D = "Binomial", estoptions = list(EstM = 1, resi.store=F,
debugmode=T, optimat=T,
mcmcMeth=list(iterations=10, burnin=10, refresh = 10),
mcmcOptions=list(hcen=3),
startval=list(FP.b = round(m1test3IGLS@FP, 2), RP.b = round(m1test3IGLS@RP, 2))), data = dfsm, workdir = tempdir(), MLwiNPath="C:/Program Files (x86)/MLwiN v2.35/")
modelsstart <- PLOSONEestim


3. Changing the refresh option merely within Mlwin does not help.

Regarding the first problem - it looks as if the column in question (c2499) has been allocated for both the starting residuals (as it appears in the MCMC 0 command) and as the IGLS random constraint column. This has resulted in the two sets of values being appended (the first 4 values are the constraints, and the following rows are the residuals). I remember fixing an issue similar to this recently, are you using the development version of R2MLwiN? You might see the method set to IGLS initially when you request an MCMC model as the model is set up and run for some iterations with IGLS, and the starting residuals are generated using the IGLS model where appropriate.
1) I am re-running the model in the development version currently. I will come back as soon as I have some new results.

2) For my own information, do you have an explanation to why the model takes starting values from some datasets but not other datasets? This seems to be purely data driven and maybe depending on the size of the data.