Time varying predictors at higher aggregation levels

Raphael · Post by **Raphael** » Wed Mar 19, 2014 12:23 am

The case:
I am trying to estimate event history models (also known as survival models) with time-varying predictors at two different levels of (geographical) aggregation. More precisely, I am using a discrete time event history model (logit model on stacked data) to predict the odds of outmigration (mig) at the household-level. Each household is exposed to the hazard of migration over a certain period (in this example three years; exposure). I have a number of time-varying (e.g., wx = cumulative working experience of household head) and time invariant household-level predictors (e.g., fem = household head is female) to control for the effect of varies socio-demographic on the decision to migrate. However, the households in my sample are located in different municipalities (MunID). In my research I am interested in how a set of time-varying characteristics of the environment (Env1, e.g. rainfall decline) that operate at the municipality-level impact the odds of household-level outmigration. However, I also need to control for some time-invariant municipality-level characteristics (Env2, e.g., % land used for agricultural production). A simplified example of the data structure is presented in the below table (sorry for abusing the code feature for the display).

Code: Select all

exposure       HHID        HHIDy       mig         wx         fem        MunID          MunIDy         Env1       Env2
1              A           A_1         0           1           0           M1           M1_1           4           3
2              A           A_2         0           2           0           M1           M1_2           5           3
3              A           A_3         1           3           0           M1           M1_3           6           3
1              B           B_1         0           5           1           M1           M1_1           4           3
2              B           B_2         0           5           1           M1           M1_2           5           3
3              B           B_3         0           6           1           M1           M1_3           6           3
1              C           C_1         0           3           0           M1           M1_1           4           3
2              C           C_2         1           4           0           M1           M1_2           5           3
1              D           D_1         0           7           0           M1           M1_1           4           3
2              D           D_2         0           8           0           M1           M1_2           5           3
3              D           D_3         0           9           0           M1           M1_3           6           3
1              E           E_1          1           2           0           M2           M2_1           2.5        6
1              F           F_1          0           2           0           M2           M2_1           2.5         6
2              F           F_2          0           3           0           M2           M2_2           1            6
3              F           F_3          0           4           0           M2           M2_3           3            6
1              G           G_1         0           8           1           M2           M2_1           2.5         6
2              G           G_2         1           8           1           M2           M2_2           1            6
1              H           H_1         0           5           0           M2           M2_1           2.5         6
2              H           H_2         0           6           0           M2           M2_2           1            6
3              H           H_3         0           6           0           M2           M2_3           3            6

The problem:
Because I have two levels of aggregation (households clustered in municipalities), I was intending to use logistic multilevel models. However, I am not quite sure how to correctly specify my levels so that the aggregate-level nature of my time-varying predictor at level-3 (e.g., Env1) is correctly accounted for.

Possible solutions:
1. Courgeau (2007) describes a multilevel event history model with three levels: Time (level-1) is nested within individuals (level-2), who are nested within states (level-3). However, Courgeau only mentions a time-invariant state-level predictor (which of course has the same values for all person-years/rows within each state-level unit). In my case, I have the problem that a time-varying predictor at the municipality-level (e.g., Env1) would not be recognized by MLwiN as operating at the municipality-level (level-3) because the values within each aggregation unit vary across time. However, the standard errors of the estimate for Env1 will be biased if the model considers this variable as a level-1 predictor because at each time point all households within one municipality will have the same Env1 value.

2. As another option, I could use the combined MunIDy variable to specify my third-level. MunIDy combines the municipality ID (MunID) with the exposure year variable (exposure) and results in n=3*2=6 aggregation units at level-3. However, this solution seems to be also less ideal since, each level-3 unit would contain only household and municipality level values for one exposure year (e.g., one unit would consists of all cases/observations in a particular exposure year and a particular community), and if I sort the data on MuniIDy it messes up the event history.

Does anyone have an idea of how to correctly specify the levels in my analysis so that I can investigate the effect of time-varying predictors at level-3? Or can anyone point me to published work that uses a multi-level event history analysis with time-varying predictors at higher aggregation levels? Thanks a lot for any help!

References:
Courgeau, D. (2007). Multilevel synthesis: From the group to the individual. Dordrecht, The Netherlands: Springer.

GeorgeLeckie · Post by **GeorgeLeckie** » Wed Mar 19, 2014 2:10 pm

Hi Raphael,

Interesting query. Perhaps you shout put in a municipality-by-year random-interaction effects into your model.

However, I would recommend you post this topic to the multilevel JISC email list, as it not specific to runmlwin (or for that matter MLwiN). The JISC email list is read very widely so you will have much more chance of getting a good response there.

Best wishes

George

Raphael · Post by **Raphael** » Thu Mar 20, 2014 3:42 am

Hi George,
thanks so much for the great advice!

Best,
Raphael

Raphael · Post by **Raphael** » Tue Apr 08, 2014 3:16 am

I think I found a solution. I read two book chapters about multilevel event history models (Courgeau, 2007; Goldstein, 2011), which discuss similar cases and suggest using a three-level structure such as time (level-1) nested within households (level-2), which are in turn nested within municipalities (level-3). Goldstein (2011, p. 221) explicitly states for this structure that “The exploratory variables can be defined at any level. They may also vary over time, allowing so-called time varying covariates.”

So here is a quick explanation why I think that such a three-level model is able to correctly incorporate time-varying predictors at the municipality-level (level-3), such as the environmental variable “Env1”. Because Env1 varies across time, the model automatically treats it as a level-1 variable. It does not know that at each time step (e.g., year 1990), the values for Env1 are the same for all households located in a particular municipality. However, I don’t think that this biases the standard errors for the Env1 variable because I have household random effects (level-2) included in the model, which estimate a separate intercept for each household. Moreover, I also include an additional variance component at level-2 that allows the slope of Env1 to vary randomly across households. In this way the effect of Env1 is uniquely computed for each household.

References:
Courgeau, D. (2007). Multilevel synthesis: From the group to the individual. Dordrecht, The Netherlands: Springer.
Goldstein, H. (2011). Multilevel statistical models (4th ed.). Chichester, U.K.: John Wiley & Sons.

GeorgeLeckie · Post by **GeorgeLeckie** » Tue Apr 08, 2014 9:52 am

Thanks very much for replying
I suppose one might also consider a simulation study to explore these issues
Best wishes
George

www.cmm.bristol.ac.uk/forum

Time varying predictors at higher aggregation levels

Time varying predictors at higher aggregation levels

Re: Time varying predictors at higher aggregation levels

Re: Time varying predictors at higher aggregation levels

Re: Time varying predictors at higher aggregation levels

Re: Time varying predictors at higher aggregation levels