MLM in MLwiN with Censored Data

Welcome to the forum for MLwiN users. Feel free to post your question about MLwiN software here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Remember to check out our extensive software FAQs which may answer your question: http://www.bristol.ac.uk/cmm/software/s ... port-faqs/
Post Reply
edwardoughton
Posts: 6
Joined: Tue Jun 03, 2014 7:54 am

MLM in MLwiN with Censored Data

Post by edwardoughton »

I could really use some advice please on multilevel modelling in MLwiN with censored data.

I have been using a variance components model to analyse broadband speed at the postcode level (n= ~1 million), nested within clusters at the Middle Super Output Layer (n= ~7,000). Around half of the independent variables are at level 1 and half are at level 2. I've been taking a random sample of 250,000 observations for computational ease.

The distribution of the dependent variable is attached below. The bimodal distribution relates to the difference in urban versus rural broadband speed. I'm likely to divide out the urban and rural postcodes and conduct two separate analyses as a consequence. My concern here is that speeds above 30Mbps have been censored and could impact on producing reliable model coefficients. Is this serious enough for me to need to address and if so, how could I do it in MLwiN?
Screenshot 2014-07-02 09.26.30.png
Screenshot 2014-07-02 09.26.30.png (20.27 KiB) Viewed 6408 times
billb
Posts: 157
Joined: Fri May 21, 2010 1:21 pm

Re: MLM in MLwiN with Censored Data

Post by billb »

Hi Edward,
MLwiN does not have specific functionality for handling censored data. In fact the exam datasets that are often used in educational examples have similar issues of censoring - where marks are assumed normally distributed but in reality are constrained to lie between 0 marks and full marks. Here one can fit models as usual and then examine the residual plots to look and see if any patterns are seen in them that indicate issues to the large number of 30s. In a Bayesian framework in something like WinBUGS or Stat-JR it is possible to indicate that some responses are censored and that for those responses we simply observe the 30 and treat the true value as latent with some suitable prior but such approaches are more challenging to work with. If your data looked more extreme say with half the data at 30 then I might suggest instead modelling a binary indicator of >30 or not and fitting models to that but your histogram doesn't look that extreme.
Hope this helps,
Bill.
edwardoughton
Posts: 6
Joined: Tue Jun 03, 2014 7:54 am

Re: MLM in MLwiN with Censored Data

Post by edwardoughton »

Dear Bill,

Thank you very much for your advice. In which case, do you have to accept that your dependent variable is sometimes not normally distributed when using censored data? After taking a random sample of 1,000 observations from my very large data set (1> million observations), here is the distribution of the dependent variable. As you can see it stops abruptly at 30Mbps and the formal tests I've undertaken indicate it's significantly different from the normal distribution. Various transformations have been unable to shift the distribution closer to the normal.
Screenshot 2014-07-04 10.30.14.png
Screenshot 2014-07-04 10.30.14.png (23.4 KiB) Viewed 6388 times
Additionally, after analysing the standardised Level 1 residuals in MLwiN I obtained the graph below which indicates some deviation at the lower and upper bounds of the dataset (I have normalised the dependent variable).
Screenshot 2014-07-04 10.05.11.png
Screenshot 2014-07-04 10.05.11.png (15.78 KiB) Viewed 6391 times
Do I need to rethink my methodology at this stage or do you think I have grounds to continue regardless of the censored distribution?

Thanks,

Ed
billb
Posts: 157
Joined: Fri May 21, 2010 1:21 pm

Re: MLM in MLwiN with Censored Data

Post by billb »

Dear Edward,
Apologies for the delay in replying - I think this might be a case of 'all models are wrong but some models are interesting' as George Box is quoted. Your normalised scores plot does indeed show something of an S shape which I am guessing is indicating the threshold at the top end. Of course your data is already thresholded at the other end by 0 and the tail behaviour in the histogram looks no better at this end. Depending on the audience I'd probably (particularly if I was an applied researcher) just carry on going with an analysis whilst adding in the caveat that the data contains censoring. Actually I might personally being a stats methodologist code up something that took into account the censoring but this isn't currently available in MLwiN. One might be able to code it up in WinBUGS and even use MLwiN to generate WinBUGS code that one could edit to account for censoring but this isn't totally straightforward.
Hope this points you in a direction.
Regards,
Bill.
Post Reply