Page 1 of 1

Random Coefficient Model: Zero Variance/Covariance Terms

Posted: Sat Jan 30, 2016 12:29 pm
by jackwhybrow
Hi,

I wondered whether you could help with a problem I am having.
Unfortunately I can't be too specific with the details as I use secure data.
Hopefully though the detail presented herein will be sufficient to give you a handle on the problem.

The Issue: When I trialled some random coefficient logistic models, Stata/MLwiN reports that some of the variance/covariance terms as zero.
I specified the following variables of interest as random coefficients: student level - household income; ability and school level: sixthform; grammar school; Ofsted band dummy.
Alarm bells started to ring when the aggregated wald test and mcmc would not satisfactorily compute.

Model Components
Dependant variables: Higher Education Participation by age 20 {b}
Student level explanatory variables: gender {b}, 1st academic quarter of Birth {b}, single parent household {n}, family highest social status {c}, family highest educational qualification {c}, imputed household income {n}, index of multiple deprivation quartiles {c}, government office region {c}, key stage – key skill (ability) principal components {n}, ethnicity {c}, first language {c}, Cultural Capital principal component {n}, Habitus principal components {n}, Social Capital principal components {n}.
School level explanatory variables: sixthform {b}, grammar school {b}, % pupils eligible for school free school meals {n}, and Ofsted band {c}.
Where {b} binary indicators, {c} categorical – multiple binary indicators, and {n} continuous.
Household income is not mean centred.

Syntax: runmlwin Yi x1i x2i ... xni x1ij x2ij ... xnij cons if sample == 1, level2(SchoolID: xxi/xxij cons) level1(StudentID: ) discrete(distribution(binomial) link(logit) denominator(cons)) nopause maxiterations(x)
This computes the modal using the estimation algorithm IGLS and MQL1.
Specifying "...denominator(cons)[PQL2] initsprevious nopause..." computes the model using IGLS and PQL2.
(Due to the setup I am using melogit takes too long to compute and I do not have access to a computing cluster)

Specifics: My group sample sizes average about 8 with a minimum group size of 1 and maximum of 19.That equates to about 4,200 individuals nested within 520 schools. Without school level explanatory variables the VPC of the random intercept model (var(cons)/var(cons) + 3.29) reveals 2.4% of the variance is due to differences between schools. This drops to approximately 0.0% when school level explanatory variables are included aswell. (For the VPC calculation I referred to the free online course - Module 7: Multilevel Models for Binary Responses STATA Practical p.18)

Does this indicate that I have insufficient variation in the data to estimate a random coefficient model?
Therefore I should stick to the random intercept model as my preferred specification?

Best wishes,

Jack Whybrow
PhD Researcher, University of East Anglia

Re: Random Coefficient Model: Zero Variance/Covariance Terms

Posted: Wed Apr 13, 2016 2:40 pm
by GeorgeLeckie
Dear Jack,

Given a binary response, the model looks rather complex with many random coefficients. A binary response contains far less information than a continuous response. Also your cluster sizes are small with 8 pupils per school on average. There will be little variation in the response within clusters. I would simplify the model to either a random-intercept model or a random-coefficient one where you only include a random coefficient on the main covariate of interest.

Note that making school-level covariates random at the school-level is non-standard and I wouldn't go down this route unless you know exactly what you are doing and why.

If your VPC drops to zero when you add the school-level covariates (Note I would also fit the model by MCMC as PQL suffers from known biases in certain situations), then at face value it would suggest that you no longer need a multilevel model and that there is no remaining residual clustering.

Best wishes

George