Saving estimates from models (using MCMC) executed using runmlwin

amaald21 · Post by **amaald21** » Wed Nov 24, 2021 4:16 am

Hello, and apologies in advance if my query is not properly explained as this is my first time posting here. I have a query relating to the proper way to save/store results (ideally in Stata) from a model run using MCMC estimation, executed by the runmlwin command, so that I can use the results in a later Stata session. The reason I am asking is that with the syntax I am currently using, when I restore my saved estimates in a later Stata session, I notice they are different than what is displayed in my Stata log file (the original Stata session, when I ran the runmlwin command). I realise this query touches on a bit of Stata syntax, but I suspect the crux of the issue is that there is something missing from my runmlwin command, or perhaps something I am not quite grasping about how runmlwin works when retrieving estimates from a model using MCMC estimation.

Here's what I did:

Within Stata, I used the runmlwin command to run a series of multilevel models using nested loops. I started with MQL1 estimation, then PQL2, and used the PQL2 estimates as inputs for the MCMC model. After each MCMC model was run, I used Stata’s ‘estimates store’ command to store the model estimates:

Code: Select all

foreach y of varlist EX IN CO {
	foreach x of varlist pos4 pos8 {
		forvalues i = 1/8 {
			runmlwin `y' cons i.`x' gender01 i.langcat atsi specialneeds i.m_educat3 if capcity==`i', ///
				level2(sa1_maincode_2016:cons) ///
				level1(record_id) discrete(distribution(binomial) link(logit) denom(cons) ///
				mql1) igls nopause or mlwinpath("C:\Program Files\MLwiN v3.04\mlwin.exe")
			runmlwin `y' cons i.`x' gender01 i.langcat atsi specialneeds i.m_educat3 if capcity==`i', ///
				level2(sa1_maincode_2016:cons) ///
				level1(record_id) discrete(distribution(binomial) link(logit) denom(cons) ///
				pql2) initsprevious igls nopause or mlwinpath("C:\Program Files\MLwiN v3.04\mlwin.exe")
			runmlwin `y' cons i.`x' gender01 i.langcat atsi specialneeds i.m_educat3 if capcity==`i', ///
				level2(sa1_maincode_2016:cons) ///
				level1(record_id) discrete(distribution(binomial) link(logit) denom(cons)) ///
				mcmc(chain(20000)) initsprevious igls nopause or mlwinpath("C:\Program Files\MLwiN v3.04\mlwin.exe")
					estimates store Ch_`x'_`y'_`i'
				mcmcsum, detail 
				mcmcsum, trajectories
				graph export Ch_`x'_`y'_`i'.png, as(png) replace 
			}
		}	
	}

I then saved the MCMC model estimates as ‘.ster’ files, by first restoring each model (using Stata’s ‘estimates restore’ command) and then saving (using Stata’s ‘estimates save’). After they were saved, I cleared the estimates.

Code: Select all

foreach y of varlist EX IN CO {
		foreach x of varlist pos4 pos8 {
			forvalues i = 1/8 {
				estimates restore Ch_`x'_`y'_`i'
				estimates save Ch_`x'_`y'_`i', replace
				}
			}
		}	
		estimates clear

The commands I am using to display the stored estimates in my later Stata sessions are for example:

Code: Select all

estimates use Ch_pos8_EX_3
estimates store Ch_pos8_EX_3
estimates restore Ch_pos8_EX_3
estimates replay

And to display them in exponentiated form, I use ‘ereturn display’, for example:

Code: Select all

ereturn display, eform(_2_pos8 _3_pos8)

However, ‘ereturn display’ yields different results from the estimates in my Stata log file (which are also in exponentiated/odds ratio form). I manually exponentiated the results from ‘estimates replay’ and the results are the same as those from ‘ereturn display’ – but therefore, also don’t match the log file results. Interestingly though, all of the other details (e.g., run time, dbar, thetabar, effective no. of parameters, Bayesian DIC) are identical across the stored estimates and log file estimates.

Is this the proper way to go about saving the results from an MCMC model executed using runmlwin, so that I can retrieve them in later Stata sessions (or, is there another way to do this)? Have I missed something important about how MCMC estimates retrieved by runmlwin need to be saved?

Thanks very much for your advice, it is much appreciated.
Amanda

ChrisCharlton · Post by **ChrisCharlton** » Wed Nov 24, 2021 5:31 pm

It looks as if the estimates replay command does not carry across the display options set when running the command, however you can re-specify them when displaying the results, for example:

Code: Select all

. use http://www.bristol.ac.uk/cmm/media/runmlwin/bang, clear

. logit use age, or

Iteration 0:   log likelihood = -1926.3176  
Iteration 1:   log likelihood = -1922.2862  
Iteration 2:   log likelihood = -1922.2857  
Iteration 3:   log likelihood = -1922.2857  

Logistic regression                                     Number of obs =  2,867
                                                        LR chi2(1)    =   8.06
                                                        Prob > chi2   = 0.0045
Log likelihood = -1922.2857                             Pseudo R2     = 0.0021

------------------------------------------------------------------------------
         use | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   1.012094   .0042846     2.84   0.005     1.003731    1.020526
       _cons |   .6609781   .0252657   -10.83   0.000     .6132678    .7124001
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. estimates store m1

. ereturn clear

. estimates restore m1
(results m1 are active now)

. estimates replay

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Model m1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Logistic regression                                     Number of obs =  2,867
                                                        LR chi2(1)    =   8.06
                                                        Prob > chi2   = 0.0045
Log likelihood = -1922.2857                             Pseudo R2     = 0.0021

------------------------------------------------------------------------------
         use | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0120213   .0042334     2.84   0.005      .003724    .0203185
       _cons |  -.4140346   .0382247   -10.83   0.000    -.4889536   -.3391156
------------------------------------------------------------------------------

. logit

Logistic regression                                     Number of obs =  2,867
                                                        LR chi2(1)    =   8.06
                                                        Prob > chi2   = 0.0045
Log likelihood = -1922.2857                             Pseudo R2     = 0.0021

------------------------------------------------------------------------------
         use | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0120213   .0042334     2.84   0.005      .003724    .0203185
       _cons |  -.4140346   .0382247   -10.83   0.000    -.4889536   -.3391156
------------------------------------------------------------------------------

. logit, or

Logistic regression                                     Number of obs =  2,867
                                                        LR chi2(1)    =   8.06
                                                        Prob > chi2   = 0.0045
Log likelihood = -1922.2857                             Pseudo R2     = 0.0021

------------------------------------------------------------------------------
         use | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   1.012094   .0042846     2.84   0.005     1.003731    1.020526
       _cons |   .6609781   .0252657   -10.83   0.000     .6132678    .7124001
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

I don't think that this will not work for your MCMC model however as it uses the parameter chains when doing transformations, and these are stored in a hidden mata matrix that is lost when you run the next model. I suspect that the difference in results you are seeing for your MCMC models is that the values originally reported from runmlwin are the means of a chain where each value has been exponentiated but when you exponentiate the reported parameter value in the saved results you are exponentiating a mean. I suspect that you would get the same results as in your log file if you saved the chains for each model, which you can either do with the savechains option in runmlwin or manually through using the getchains option of mcmcsum. Once you have the chains as a data file you can exponentiate the variables chains there and use mcmcsum with the variables option to do all of your normal diagnostics.

amaald21 · Post by **amaald21** » Fri Nov 26, 2021 12:35 am

Hi Chris,

Thanks so much for looking into this - really appreciate your help!

I've re-run the runmlwin command with the savechains option specified, and then loaded in the chains data file. For each coefficient/chain, I created a new variable containing the exponentiated variable chains and I can see these new exponentiated coefficients and their credible intervals now when I run mcmcsum, variables and these match the estimates in my original log file.

After running mcmcsum, variables I checked to see if I might be able to access the results from my new exponentiated chains as e-class results:

Code: Select all

ereturn list
matrix list e(b)
matrix list e(ub)
matrix list e(lb)

I can see the (unexponentiated) coefficients are stored in matrix e(b), but I don't see any new (exponentiated) coefficients or credible intervals added here.

I also checked the r-class results:

Code: Select all

return list

The coefficient for the last variable (the last chain that I exponentiated) is there stored in r(mean) and the credible intervals are there as well in r(p2_5) and r(p97_5). Everything stored in r-class is pertaining only to the last variable that I exponentiated.

Ultimately I am trying to create some results tables and figures using the exponentiated coefficients (odds ratios) and their credible intervals, using commands like estout/esttab or putexcel. I could exponentiate the chains one at a time and then use the r-class results from mcmcsum to add them to a table one by one, but I am wondering if there is a way to do this more efficiently? Is there a way I can get all of the new exponentiated versions and their credible intervals stored somewhere, either through mcmcsum, variables or another command?

Thank you again for your advice!
Amanda

ChrisCharlton · Post by **ChrisCharlton** » Fri Nov 26, 2021 11:58 am

Unfortunately I don't think that there is currently a straightforward way to do this.

If you look at lines 410 and 535 of (mcmcsum.ado) you will see that the results displayed for the summaries and detail option are stored returned in an r-class after the command is run, although as you say this is only for the last variable displayed. You could change the summaries version to return all parameters instead by copying each iteration into a row of a combined matrix and returning that. Alternatively you can see in runmlwin.ado lines 4099 onwards where it is posting the results brought back from MLwiN. You could modify this based on the lines 5793 onwards to save the means of the exponentiated chains instead, although you would have to be careful to only transform the fixed-effect values.

amaald21 · Post by **amaald21** » Mon Feb 07, 2022 12:21 am

Hi Chris, thanks so much for these ideas!

I ended up exponentiating the variables one-by-one, using "quietly mcmcsum, variables" to obtain the r-class results (estimate, p2.5 and p97.5) and adding them into a table using putdocx table. I've added some example code for one of the coefficients below in case it is useful to others.

Code: Select all

use "Chains.dta", clear
gen e_seifa1 = exp(FP1__1_seifa)
quietly mcmcsum, variables
putdocx table Table2(8,5) = (r(mean))
putdocx table Table2(8,6) = (r(p2_5))
putdocx table Table2(8,7) = (r(p97_5))

Thanks again for your help!
Amanda

www.cmm.bristol.ac.uk/forum

Saving estimates from models (using MCMC) executed using runmlwin

Saving estimates from models (using MCMC) executed using runmlwin

Re: Saving estimates from models (using MCMC) executed using runmlwin

Re: Saving estimates from models (using MCMC) executed using runmlwin

Re: Saving estimates from models (using MCMC) executed using runmlwin

Re: Saving estimates from models (using MCMC) executed using runmlwin