I am trying to estimate event history models (also known as survival models) with time-varying predictors at two different levels of (geographical) aggregation. More precisely, I am using a discrete time event history model (logit model on stacked data) to predict the odds of outmigration (mig) at the household-level. Each household is exposed to the hazard of migration over a certain period (in this example three years; exposure). I have a number of time-varying (e.g., wx = cumulative working experience of household head) and time invariant household-level predictors (e.g., fem = household head is female) to control for the effect of varies socio-demographic on the decision to migrate. However, the households in my sample are located in different municipalities (MunID). In my research I am interested in how a set of time-varying characteristics of the environment (Env1, e.g. rainfall decline) that operate at the municipality-level impact the odds of household-level outmigration. However, I also need to control for some time-invariant municipality-level characteristics (Env2, e.g., % land used for agricultural production). A simplified example of the data structure is presented in the below table (sorry for abusing the code feature for the display).
Code: Select all
exposure HHID HHIDy mig wx fem MunID MunIDy Env1 Env2
1 A A_1 0 1 0 M1 M1_1 4 3
2 A A_2 0 2 0 M1 M1_2 5 3
3 A A_3 1 3 0 M1 M1_3 6 3
1 B B_1 0 5 1 M1 M1_1 4 3
2 B B_2 0 5 1 M1 M1_2 5 3
3 B B_3 0 6 1 M1 M1_3 6 3
1 C C_1 0 3 0 M1 M1_1 4 3
2 C C_2 1 4 0 M1 M1_2 5 3
1 D D_1 0 7 0 M1 M1_1 4 3
2 D D_2 0 8 0 M1 M1_2 5 3
3 D D_3 0 9 0 M1 M1_3 6 3
1 E E_1 1 2 0 M2 M2_1 2.5 6
1 F F_1 0 2 0 M2 M2_1 2.5 6
2 F F_2 0 3 0 M2 M2_2 1 6
3 F F_3 0 4 0 M2 M2_3 3 6
1 G G_1 0 8 1 M2 M2_1 2.5 6
2 G G_2 1 8 1 M2 M2_2 1 6
1 H H_1 0 5 0 M2 M2_1 2.5 6
2 H H_2 0 6 0 M2 M2_2 1 6
3 H H_3 0 6 0 M2 M2_3 3 6
Because I have two levels of aggregation (households clustered in municipalities), I was intending to use logistic multilevel models. However, I am not quite sure how to correctly specify my levels so that the aggregate-level nature of my time-varying predictor at level-3 (e.g., Env1) is correctly accounted for.
Possible solutions:
1. Courgeau (2007) describes a multilevel event history model with three levels: Time (level-1) is nested within individuals (level-2), who are nested within states (level-3). However, Courgeau only mentions a time-invariant state-level predictor (which of course has the same values for all person-years/rows within each state-level unit). In my case, I have the problem that a time-varying predictor at the municipality-level (e.g., Env1) would not be recognized by MLwiN as operating at the municipality-level (level-3) because the values within each aggregation unit vary across time. However, the standard errors of the estimate for Env1 will be biased if the model considers this variable as a level-1 predictor because at each time point all households within one municipality will have the same Env1 value.
2. As another option, I could use the combined MunIDy variable to specify my third-level. MunIDy combines the municipality ID (MunID) with the exposure year variable (exposure) and results in n=3*2=6 aggregation units at level-3. However, this solution seems to be also less ideal since, each level-3 unit would contain only household and municipality level values for one exposure year (e.g., one unit would consists of all cases/observations in a particular exposure year and a particular community), and if I sort the data on MuniIDy it messes up the event history.
Does anyone have an idea of how to correctly specify the levels in my analysis so that I can investigate the effect of time-varying predictors at level-3? Or can anyone point me to published work that uses a multi-level event history analysis with time-varying predictors at higher aggregation levels? Thanks a lot for any help!
References:
Courgeau, D. (2007). Multilevel synthesis: From the group to the individual. Dordrecht, The Netherlands: Springer.