Random Coefficient Model: Zero Variance/Covariance Terms

Welcome to the forum for runmlwin users. Feel free to post your question about runmlwin here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Go to runmlwin: Running MLwiN from within Stata >> http://www.bristol.ac.uk/cmm/software/runmlwin/
Post Reply
jackwhybrow
Posts: 1
Joined: Mon Jan 25, 2016 9:33 am

Random Coefficient Model: Zero Variance/Covariance Terms

Post by jackwhybrow »

Hi,

I wondered whether you could help with a problem I am having.
Unfortunately I can't be too specific with the details as I use secure data.
Hopefully though the detail presented herein will be sufficient to give you a handle on the problem.

The Issue: When I trialled some random coefficient logistic models, Stata/MLwiN reports that some of the variance/covariance terms as zero.
I specified the following variables of interest as random coefficients: student level - household income; ability and school level: sixthform; grammar school; Ofsted band dummy.
Alarm bells started to ring when the aggregated wald test and mcmc would not satisfactorily compute.

Model Components
Dependant variables: Higher Education Participation by age 20 {b}
Student level explanatory variables: gender {b}, 1st academic quarter of Birth {b}, single parent household {n}, family highest social status {c}, family highest educational qualification {c}, imputed household income {n}, index of multiple deprivation quartiles {c}, government office region {c}, key stage – key skill (ability) principal components {n}, ethnicity {c}, first language {c}, Cultural Capital principal component {n}, Habitus principal components {n}, Social Capital principal components {n}.
School level explanatory variables: sixthform {b}, grammar school {b}, % pupils eligible for school free school meals {n}, and Ofsted band {c}.
Where {b} binary indicators, {c} categorical – multiple binary indicators, and {n} continuous.
Household income is not mean centred.

Syntax: runmlwin Yi x1i x2i ... xni x1ij x2ij ... xnij cons if sample == 1, level2(SchoolID: xxi/xxij cons) level1(StudentID: ) discrete(distribution(binomial) link(logit) denominator(cons)) nopause maxiterations(x)
This computes the modal using the estimation algorithm IGLS and MQL1.
Specifying "...denominator(cons)[PQL2] initsprevious nopause..." computes the model using IGLS and PQL2.
(Due to the setup I am using melogit takes too long to compute and I do not have access to a computing cluster)

Specifics: My group sample sizes average about 8 with a minimum group size of 1 and maximum of 19.That equates to about 4,200 individuals nested within 520 schools. Without school level explanatory variables the VPC of the random intercept model (var(cons)/var(cons) + 3.29) reveals 2.4% of the variance is due to differences between schools. This drops to approximately 0.0% when school level explanatory variables are included aswell. (For the VPC calculation I referred to the free online course - Module 7: Multilevel Models for Binary Responses STATA Practical p.18)

Does this indicate that I have insufficient variation in the data to estimate a random coefficient model?
Therefore I should stick to the random intercept model as my preferred specification?

Best wishes,

Jack Whybrow
PhD Researcher, University of East Anglia
GeorgeLeckie
Site Admin
Posts: 432
Joined: Fri Apr 01, 2011 2:14 pm

Re: Random Coefficient Model: Zero Variance/Covariance Terms

Post by GeorgeLeckie »

Dear Jack,

Given a binary response, the model looks rather complex with many random coefficients. A binary response contains far less information than a continuous response. Also your cluster sizes are small with 8 pupils per school on average. There will be little variation in the response within clusters. I would simplify the model to either a random-intercept model or a random-coefficient one where you only include a random coefficient on the main covariate of interest.

Note that making school-level covariates random at the school-level is non-standard and I wouldn't go down this route unless you know exactly what you are doing and why.

If your VPC drops to zero when you add the school-level covariates (Note I would also fit the model by MCMC as PQL suffers from known biases in certain situations), then at face value it would suggest that you no longer need a multilevel model and that there is no remaining residual clustering.

Best wishes

George
Post Reply