Out-of-sample predictions: Help & guidance

Welcome to the forum for MLwiN users. Feel free to post your question about MLwiN software here. The Centre for Multilevel Modelling take no responsibility for the accuracy of these posts, we are unable to monitor them closely. Do go ahead and post your question and thank you in advance if you find the time to post any answers!

Remember to check out our extensive software FAQs which may answer your question: http://www.bristol.ac.uk/cmm/software/s ... port-faqs/
Post Reply
edwardoughton
Posts: 6
Joined: Tue Jun 03, 2014 7:54 am

Out-of-sample predictions: Help & guidance

Post by edwardoughton »

I've been attempting to generate out-of-sample predictions for (logged) broadband speeds. I have repeated measures broadband speed data for four years (level 1) nested within 174 local authority areas (level 2), nested within 10 UK regions (level 3).

The (logged and grand mean centred) predictors are (see attached photo for example):
- year number
- population density
- median income
- number of firms
- percentage of service sector employment

I have a range of demographic and economic projections across different high, medium and low growth scenarios. I have been using the customised prediction window to forecast the effect of these scenarios on broadband speed. I would appreciate some advice on using this please, as I have read the section in the supplementary MLwiN manual (Rasbash et al. 2014), but still need more help.

Question 1. Can I feed the model values for specific level 1 units? For example, estimate how changes in population density for London, Leeds, Newcastle etc. lead to changes in predicted values of y (broadband speed)? Ideally I want to feed the forecasted level 1 predictor data to each local authority, and get a expected predicted value of y, so that I can develop an understanding how how broadband supply might change over time based on different scenarios. I'm not sure whether I can do this in MLwiN. (If I can't do local authority predictions, I'd be happy with forecasting the regional trajectory and someone might be able to advise on how to do this by including categorical predictors as groupings maybe?)

Question 2. Say I have data for 2020 for each of the predictors, such as year number, population density, income etc. how do I estimate the overall predicted change in y (broadband speed) across all these variables at once? The output provided by the customised predictions grid was not really what I was expecting, as it breaks the data down by each individual incremental change in value for each predictor. I'm not sure how I reconcile this to say that this is the predicted value of y based on this set of forecasted values of the predictors.

Question 3 I had used a polynomial on the year predictor to capture the growth curve dynamic, but this produced implausible out-of-sample predictions of y. I then looped back around to just a linear approach with complex level 1 variance on the year predictor, which produced more sensible predictions, but still went exponential by 2025. I suspect broadband speeds work more based on a logistic function/s-curve. Are there any modelling tricks which would enable the out-of-sample predictions to bring the estimates ten years in the future to more of a flat plateau rather than increasing to infinity?

Any help would be much appreciated.

Edward
Attachments
Old 2.png
Old 2.png (88.41 KiB) Viewed 5004 times
joneskel
Posts: 26
Joined: Thu Nov 15, 2012 3:09 pm

Re: Out-of-sample predictions: Help & guidance

Post by joneskel »

I will try and answer but as a lot of this is substantive rather than method /software my answers may be limited.

First - customised predictions - these make predictions for the response for chosen specific values of the predictor. This works best when you have built a model using the Add term window or equivalent syntax. That is you do not use Calculate to create to polynomial but Add a variable Time and then Modify it to be a polynomial of Time. Similarly to create an interaction use the Add window and choose 1st order interaction and then specify the variables involved. Normally these variables will have already been included via the Add window as main effects before being added as an interaction. The customised predictions facility will if the model has been built by the Add term function understand if variables are involved in interactions etc. and make the correct predictions; you only have to specify values for the main effects.

To use the Customised predictions window you can set any values of any predictor to what you want. At the outset all predictors (even categorical ones) are set their average values but you can modify this to the values you want. Any predictors that you do not change will remain at their average value. This procedure is designed for the fixed effects and does not work for specific higher level units. It is really an interpretative tool rather than for true out of sample predictions.

The Predictions facility will predict for very unit in the data and you can chose to include any model term - fixed and random.

I have used the following trick to make real out of sample predictions. Take your observed dataset say with 6 variables and 1000 observations and add to the bottom of it the values of it say 10 observations that you want to predict for - the response variable could be any value but do not forget to put in the Constant value of 1 and the codes defining your structure. Also create a new variable say named Exclude of 1000 zeroes and 10 1's indicating the observed values and the ones you want to predict. Sort these data (including Exclude) in the usual way to define the structure and specify the model in the normal way. In the Hierarchy viewer click on options and tick on Conditional exclusion of cases and choose the Exclude variable. Estimate the model to convergence and this will be based on the 1000 observed cases. Then tick off Exclude in the Hierarchy window , do not do further estimations, but use the Predictions window to predict for 1010 observations - the ones with a 1 in Exclude are your out of sample predictions make according to the estimated model and you chosen values for predictors and what unit they belong to.

Or failing that use the model estimates and a spreadsheet to predict what values you want to.

A general comment - if you overfit to the observed data - the results will be poor when you make predictions and you can easily get impossible values with polynomials and the trend will be extrapolated in to the future.
joneskel
Posts: 26
Joined: Thu Nov 15, 2012 3:09 pm

Re: Out-of-sample predictions: Help & guidance

Post by joneskel »

If you want to see the Customised predictions facility in action see they later chapters of

Jones, K and Subramanian, V S (2014) Developing multilevel models for analysing contextuality, heterogeneity and change using MLwiN, Volume 1 , University of Bristol.

https://www.researchgate.net/publicatio ... o_on_RGate
Post Reply