Assessing the added value of (logistic) multilevel models

Welcome to the forum for MLwiN users. Feel free to post your question about MLwiN software here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Remember to check out our extensive software FAQs which may answer your question: http://www.bristol.ac.uk/cmm/software/s ... port-faqs/
Post Reply
pimverbunt
Posts: 11
Joined: Wed May 06, 2015 9:22 am

Assessing the added value of (logistic) multilevel models

Post by pimverbunt »

Dear all,

I am currently modelling a multilevel logistic multinomial regression model, where I try to estimate the odds on different forms of poverty over 28 different European countries.

I would like to test the appropriateness of multilevel models, whether 28 separate single level models are better compared to a pooled multilevel model. Ideally, I would like to test this with an explained variance measure (comparing the explained variance of single level models with the explained variance of the multilevel model).

However, in the case of a multinomial regression, the single level variance is missing.
Is there an alternative measure that I could estimate to assess the “added value” of the multilevel models? Is there any pseudo R² measure that allows comparisons of single level models (with different sample sizes) with multilevel models? Or should I best use the VPC measure?

Also, I am estimating my model with bayesian MCMC (I hope this does not complicate things).

Kind regards,
Pim
joneskel
Posts: 26
Joined: Thu Nov 15, 2012 3:09 pm

Re: Assessing the added value of (logistic) multilevel models

Post by joneskel »

In a discrete outcome model there is no level 1 variance to be estimated as it really depends on the predicted underlying response that is the fixed part.

In fitting a separate model model to each country you are doing the equivalent of a fixed effect analysis (see Bell and Jones below) in which in a single overall model a dummy is included for each country. The estimated coefficient for country will then be based just on that country - effectively excluding what is going on in other countries - and the higher level variance will be zero. If there is a large number of respondents in a country the country effect estimated by a multilevel model will be very similar to treating each country separately . However, if you have little information within a country the multilevel estimate will be shrunk towards what is going on across all countries. The multilevel estimate is therefore protecting from over-interpretation of unreliable results. So the 'added value' is given by the quality of the estimates - they are precision-weighted- and the higher level variance summarises the differences between countries and in MCMC you can put credible bounds on this. And crucially you can include higher level variables at the country level to try and account for between country differences - you cannot do this in your 28 separate models nor in the fixed effects model as in the latter you will have consumed all the degree of freedom at the higher level.

One thing you may be able to do: is to fit these two differing versions of the model (fixed effects no pooling - and -random effects partial data- dependent pooling) in MCMC and then compare the DIC which will take account of the complexity of the model via Pd - a simple example of this is given in Chapter 3 of the MCMC manual.

Multilevel modelling is partial pooling and you could evaluate his against complete pooling by omitting the between country variance (to have a single level model) and compare the DIC of these two models. This will answer robustly whether there are between country differences. If you put in a macro variable in the complete pooling non-multilevel model and there are differences between countries (or equivalently similarity within) then your standard errors are likely to be too small and this can be quite considerable.

This (unfinished paper) will get you into the shrinkage literature

https://www.researchgate.net/publicatio ... nt_results

On fixed effects and random effects see
https://www.researchgate.net/publicatio ... Panel_Data

Personally I am not really convinced by R-sq measures with discrete outcomes - the level 1 variance can never go down as you put in predictors - though people do try and get some sort of measure - have a look at

Snijders and Bosker (2012, equation 14.21) have proposed the following procedure for calculating a R2 for the binary outcome model - Snijders, T. A. B. & Bosker, R. J. (2012) Multilevel Analysis. Second edition, London: Sage

The Lemma training materials are very good on this unchanging level 1 variance and its 'scaling' effects on the rest of the model estimates as you include variables see

http://www.bristol.ac.uk/cmm/software/m ... urces.html

http://www.bristol.ac.uk/cmm/software/m ... l#discrete
pimverbunt
Posts: 11
Joined: Wed May 06, 2015 9:22 am

Re: Assessing the added value of (logistic) multilevel models

Post by pimverbunt »

First of all, thank you very much for your extensive response.

I have used the DIC value to compare the differing versions of the model (fixed effects (no pooling), random effects partial data (dependent pooling) and complete pooling), but it did not really provide me with any robust evidence on assessing the added value of a multilevel model for my specific case.

Therefore, I want to take a look at the R² measure, as described in equation 17.22 in Snijders and Bosker (2012).

For those who are not familiar with formula 17.22: the R² is defined as σf²/(σf²+τ²+σr²), with σf² being the explained variance, τ² being the second level variance, σr² the first level variance which in the latent variable approach equal to π²/3=3.29.

However, I have some questions regarding this approach:
1. The formula is defined for the case of binary logistic models, but can this approach be extended to multinomial case? For the case of the multilevel logistic model, is it sensible to calculate the R² value for the different categories of the dependent variable?
2. Can I extend the method to calculate the R² for a single level model (complete pooling)? The R² of a single level model would equal σf²/(σf²+ σr²) as τ²=0. Is it sensible to compare the R² of the single level model with the R² of the random multilevel model (partial pooling)?
3. Can I extend the method to calculate the R² for 28 countries separately (that are, by construction, single level models)? And is it sensible to compare the different R² values with each other and with the multilevel model to assess the appropriateness of multilevel models? For instance, if I were to find that for almost all countries the explained variance is higher in the separate single level models than in the multilevel model, would this be an indication that it is perhaps better to model 28 separate regressions?

I would appreciate any help regarding these questions.

Kind regards,
Pim Verbunt
Post Reply