How to use first observation in growth model simulation

Welcome to the forum for MLwiN users. Feel free to post your question about MLwiN software here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Remember to check out our extensive software FAQs which may answer your question: http://www.bristol.ac.uk/cmm/software/s ... port-faqs/
Post Reply
kaiserdominici
Posts: 18
Joined: Thu Feb 06, 2014 9:37 am

How to use first observation in growth model simulation

Post by kaiserdominici »

Suppose that the health condition of 200 patients from each of 50 GPs was observed 3 times last year. The data can be analysed as a 3-level growth model with observations nested within patients, nested within GPs (let's assume that a linear model is appropriate for this kind of data):

HEALTH_ijk = B0_ijk + B1_jk * TIME
B0_ijk = B0 + v0_k + u0_jk + e0_ijk
B1_jk = B1 + v1_k + u1_jk

We have k = 50 GPs, j = 1000 total patients, and i = 3 observations (and 3000 values of HEALTH). TIME is a variable taking values {0, 1, 2} for each observation. This is a random intercept and random slope model with 9 parameters to estimate:
  • - 2 fixed coefficients: B0 and B1 (global intercept and slope)
    - 7 random components:
    -- 3 intercept variances, one for each level (var(v0), var(u0) and var(e0));
    -- 2 slope variances, one slope for the GP and one for each patient (var(v1) and var(u1));
    -- and 2 covariances (cov(v01) and cov(u01)
Suppose also that I analyse in this fashion the data from last year, from two years ago, from three years ago... and I find that these 9 parameters do not change that much year-on-year. Surely, the patients will always be different and so will the individual intercepts and slopes, but the fixed effects will be mostly the same and so will variances and covariances.

Now suppose that I collect a first observation this year. This means that HEALTH_i=1 will have a value, but HEALTH_i=2 and HEALTH_i=3 will be N/A. I would like to use MLwiN to forecast the values of HEALTH_i=2 and HEALTH_i=3 using the information I have, both from past experience and from the current data. Specifically:
  • - From past experience, I know that the parameters will fall within a certain range
    - From the current data, I already have two parameters that will remain almost unchanged as more data become available: B0, var(v0)
In fact, I can fit the model

HEALTH_jk = B0_jk
B0_ijk = B0 + v0_k + e0_jk

The residuals are at level 2 now so they are not particularly useful, but the intercept and its variance should remain more or less the same as I collect more data.

More than a data imputation problem (the data are not "missing" in the sense that were missed during the data collection, they do not exist yet!), to me this sounds like something that should be solvable through simulation and MCMC in a Bayesian framework with the appropriate setting of priors. Is this possible in MLwiN? I have been reading the manual on MCMC estimation (http://www.bristol.ac.uk/cmm/software/m ... nuals.html) and I would be grateful if you could point me to relevant chapters or other online resources to figure this out.

Thank you for taking the time to read through all this,

k.
kaiserdominici
Posts: 18
Joined: Thu Feb 06, 2014 9:37 am

Re: How to use first observation in growth model simulation

Post by kaiserdominici »

So I tried to simulate this situation without success. First, I noted down the parameters from an analysis using last year's dataset.

Image

Obs is the HEALTH variable I mentioned in the previous post, and we have 51 GPs at level 3 and 1754 patients at level 2. Then I removed the second and third observation from the same dataset to simulate a new (but comparable) group of GPs and patients:

Image

I then changed the estimation into MCMC and, before running it, I manually changed all 0s in c1096 and c1098 into values close to what I had from the analysis of complete data. There were also some 0s in c1092 which I changed into 1s because I did not know what the column contained. I tried to run the MCMC estimation but this error was returned:

Image

I got this error regardless of whether I manually specified the priors.

Thank you from any insights,

k.
billb
Posts: 157
Joined: Fri May 21, 2010 1:21 pm

Re: How to use first observation in growth model simulation

Post by billb »

Hi kaiserdominici,
So it's clear that what you tried shouldn't work i.e. you removed years 2 and 3 for everyone so there is no data to estimate anything about them. In theory if you had done this for only a few individuals then the model should at least work but MCMC in MLwiN is only designed to "impute" missing data in multivariable models when actually it makes sense to do this so one doesn't throw away incomplete cases. You could recast your model as a (multilevel) multivariate response model maybe even with just 2 responses for time points 2 and 3 and time point 1 as a predictor. You would of course need some records where the values for years 2 and 3 are there in order to have data to estimate from.
To do precisely what you are asking then you could try using MLwiN's option to save to WinBUGS and replace values that you want to be missing with NAs in the resulting file before running the model in WinBUGS (see my MCMC book). Alternatively you could look at Stat-JR and in the Advanced guide (http://www.bristol.ac.uk/cmm/media/soft ... vanced.pdf) in the last chapter (13) we demonstrate exactly what you are asking for but in the simpler case of a regression model. You'd have to write code yourself to do this for your random slopes case or maybe email my colleague Chris Charlton who might be able to implement it in StatJR for you.
Best wishes,
Bill Browne.
kaiserdominici
Posts: 18
Joined: Thu Feb 06, 2014 9:37 am

Re: How to use first observation in growth model simulation

Post by kaiserdominici »

Hi Bill,

Thank you for your kind reply, I have been reading your manual with interest. Of course you are right in that I left nothing to estimate. I had not thought about the multivariate response model, I can start with that with the group median / mean from previous years imputed to some subjects.

I learnt from the manual that there is a

Code: Select all

SIMU 'resp'
command in MLwiN, how does it work? Does it simulate random draws from the normal distribution of each of the random components and adds the values to the fixed coefficients?

On WinBUGS, It is my understanding that WinBUGS has no issues dealing with missing data in modelled parameters (i.e., the Y), but would I be able to give some constraints to the programme? For instance, because of the way the sample is selected, I know that changes in the response variable from one year to the next are not random (clearly they are not in a strict sense because they are caused by something, but I mean that there are patterns): after three observations, I noticed that about 10% of the patients experience a constant increase in their health score against 40% who experience a consistent decrease (the remaining 50% has ups and downs). I would like to be able to model the third observation conditional on 1 and 2.

Finally, I knew about StatJR but never used it, however I have had a look at the Advanced Guide and the language looks similar to R, which I am somewhat familiar with, so I should be able to programme something without bothering Chris. I will be happy to share my code as I achieve some progress, if it could be useful to other users.

Thank you and all the best,

k.
Post Reply