model.matrix() and terms() for R2mlwin objects

adeldaoud · Post by **adeldaoud** » Tue Aug 23, 2016 5:16 pm

Hi

This relates to the question of R generic methods. I am replicating "7.4 Predicted probabilities from a multilevel model" in LEMMA, and translating it to R2mlwin objects. See here,https://www.cmm.bris.ac.uk/lemma/mod/le ... pageid=970

I am having an issue with translating this part of the code to R2mlwin:

X <- model.matrix(terms(fit), mydatapred)

terms() does not take R2mlwin objects, and wonder if anyone has a suggestion for an elegant workaround? I know that I could drop each and every not-used covariate, but this becomes laborious for large models and models with many dummies.

cheers

adeldaoud · Post by **adeldaoud** » Thu Aug 25, 2016 9:29 am

Chris, do you happen to have any input on this question?

Many thanks in advance
Adel

ChrisCharlton · Post by **ChrisCharlton** » Thu Aug 25, 2016 12:47 pm

If I understand the module correctly then it is just creating a prediction of the fixed part on a slightly modified version of the data.

The following code that uses the generic predict function should I believe do the same:

Code: Select all

> mydata <- read.table("7.4.txt",  header = TRUE, sep = ",")
> require(R2MLwiN)
Loading required package: R2MLwiN
Loading required package: stats4
Loading required package: lattice
Loading required package: memisc
Loading required package: MASS

Attaching package: ‘memisc’

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

    as.array

Loading required package: coda
The MLwiN_path option is currently set to C:/Program Files (x86)/MLwiN v2.36/
To change this use: options(MLwiN_path="<path to MLwiN>")

> (fit <- runMLwiN(logit(antemed) ~ 1 + magec + magecsq + meduc2 + meduc3 + wealthc + (1 |comm), D = "Binomial", estoptions = list(nonlinear = c(N = 1, M = 2)), data = mydata))
/nogui option ignored
ECHO    0

Some input variables held data in more precision than MLwiN supports, these have been rounded

Echoing is ON
BATC 1

Batch mode is ON
MAXI 2
STAR
iteration 0
iteration 1

Convergence not achieved
TOLE 2
MAXI 20
NEXT
iteration 2
iteration 3
iteration 4

Convergence achieved
ECHO 0
Execution completed


-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN (version: 2.36)  multilevel model (Binomial) 
       N min     mean max N_complete min_complete mean_complete max_complete
comm 361   3 14.86427  25        361            3      14.86427           25
Estimation algorithm:  IGLS PQL2        Elapsed time : 0.67s 
Number of obs:  5366 (from total 5366)        The model converged after 5 iterations.
Log likelihood:      NA 
Deviance statistic:  NA 
--------------------------------------------------------------------------------------------------- 
The model formula:
logit(antemed) ~ 1 + magec + magecsq + meduc2 + meduc3 + wealthc + 
    (1 | comm)
Level 2: comm     Level 1: l1id      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
               Coef.   Std. Err.       z    Pr(>|z|)         [95% Conf.   Interval] 
Intercept   -0.45468     0.08273   -5.50   3.883e-08   ***     -0.61683    -0.29254 
magec       -0.00039     0.00655   -0.06      0.9528           -0.01323     0.01246 
magecsq     -0.00101     0.00068   -1.47      0.1403           -0.00235     0.00033 
meduc2       0.54790     0.08460    6.48   9.386e-11   ***      0.38209     0.71371 
meduc3       1.30924     0.09778   13.39   6.996e-41   ***      1.11759     1.50089 
wealthc      0.39775     0.02959   13.44   3.344e-41   ***      0.33976     0.45574 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the comm level: 
                  Coef.   Std. Err. 
var_Intercept   0.85385     0.09358 
--------------------------------------------------------------------------------------------------- 
The random part estimates at the l1id level: 
                Coef.   Std. Err. 
var_bcons_1   1.00000     0.00000 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> mydatapred <- mydata
> mydatapred$magec <- 0
> mydatapred$magecsq <- 0
> 
> medianpredproblogit <- predict(fit, newdata = mydatapred)
> 
> require(VGAM)
Loading required package: VGAM
Loading required package: splines

Attaching package: ‘VGAM’

The following object is masked from ‘package:coda’:

    nvar

> 
> medianpredprob <- logit(medianpredproblogit, inverse = TRUE)
> 
> mydatapred2 <- unique(data.frame(cbind("medianpredprob" = medianpredprob, "medianpredproblogit" = medianpredproblogit, wealth = mydatapred$wealth, meduc = mydata$meduc)))
> 
> mydatapred2 <- mydatapred2[order(mydatapred2$wealth, mydatapred2$meduc), ]
> 
> colnames(mydatapred2)[1:2] <- c("medianpredprob","medianpredproblogit")
> 
> mydatapred2
    medianpredprob medianpredproblogit wealth meduc
13       0.2221039         -1.25344762      1     1
24       0.3305831         -0.70554891      1     2
97       0.5139447          0.05579344      1     3
4        0.2982393         -0.85569629      2     1
3        0.4236524         -0.30779758      2     2
7        0.6114817          0.45354477      2     3
18       0.3874735         -0.45794487      3     1
1        0.5224733          0.08995384      3     2
8        0.7008390          0.85129618      3     3
55       0.4849562         -0.06019354      4     1
5        0.6195657          0.48770517      4     2
2        0.7771349          1.24904751      4     3
478      0.5835971          0.33755771      5     1
66       0.7079516          0.88545642      5     2
36       0.8384579          1.64679876      5     3

You could instead go straight to calculating mediapredprob, avoiding using the VGAM package by substituting the following predict call:

Code: Select all

medianpredprob <- predict(fit, type="response", newdata = mydatapred)

adeldaoud · Post by **adeldaoud** » Thu Aug 25, 2016 2:18 pm

(A)
Thanks, Chris. Good idea, re bypassing the VGAM package entirely. It does not seem that predict() is sensitive about having variables in the data passed to it, that are not used in the model. Right?

(B)

Another issue. I replicated the module in both lme4 and R2mlwin, but I am getting quite different figures in terms of predicted probabilities. Any ideas way? Both methods use cluster-specific predictions right? and both are setting the covariates to their mean. This should produce the exact same output

Code: Select all

mydata <- read.table("7.4.txt",  header = TRUE, sep = ",")



# run with lme4

library(lme4)
(fit <- glmer(antemed ~ magec + magecsq + meduc2 + meduc3 + wealthc + (1 | comm), data = mydata, family = binomial("logit")))
summary(fit)


mydatapred <- mydata

# set to sample mean
mydatapred$magec <- 0

# set to sample mean
mydatapred$magecsq <- 0

X <-  model.matrix(terms(fit), mydatapred)

b <- fixef(fit)
medianpredproblogit <- X %*% b

library(VGAM)
medianpredprob <- logit(medianpredproblogit, inverse = TRUE)


mydatapred2 <- unique(data.frame(cbind("medianpredprob" = medianpredprob, 
                                       "medianpredproblogit" = medianpredproblogit, wealth = mydatapred$wealth, meduc = mydata$meduc)))

mydatapred2 <- mydatapred2[order(mydatapred2$wealth, mydatapred2$meduc), ]

colnames(mydatapred2)[1:2] <- c("medianpredprob","medianpredproblogit")

mydatapred2


# run with R2mlwin
library(R2MLwiN)
mydata$cons <- 1

(fit.1 <- runMLwiN(logit(antemed, cons) ~ 1 + magec + magecsq + meduc2 + meduc3 + wealthc + (1 | comm), data = mydata, D = "Binomial", estoptions = list(EstM = 0)))


medianpredprob2 <- predict(fit.1, type="response", newdata = mydatapred) # set at mean


mydatapred22 <- unique(data.frame(cbind("medianpredprob" = medianpredprob2, 
                                        wealth = mydatapred$wealth, meduc = 
                                          mydata$meduc)))

mydatapred22 <- mydatapred22[order(mydatapred22$wealth, mydatapred22$meduc), ]

colnames(mydatapred22)[1:2] <- c("medianpredprob")

mydatapred22

(C)
On a related note. I am setting se.fit=T but no standard errors are produced. Any ideas way?

Code: Select all

> medianpredprob2 <- predict(fit.1, type="response", newdata = mydatapred, se.fit=T) #
> str(medianpredprob2)
 Named num [1:5366] 0.519 0.748 0.43 0.323 0.606 ...
 - attr(*, "names")= chr [1:5366] "1" "2" "3" "4" ...

ChrisCharlton · Post by **ChrisCharlton** » Thu Aug 25, 2016 3:39 pm

(A) Yes, this is right. The R2MLwiN predict command extracts the data from the passed-in data frame using the model formula in the same way as when estimating the model (see setMethod("predict"... in https://github.com/rforge/r2mlwin/blob/ ... nfitIGLS.R).

(B) This might be due to the method of quasi-liklihood being used in MLwiN. When testing the solution for my previous post I found that I had to use PQL2 before I got results that closely much matched those from R.

(C) Yes this is true, the se.fit option only does something if type is equal to "terms" or "link". I think that I was following the example of other implementations of the predict() function in R when implementing this behaviour, however I might be able to revisit this in the future if this isn't the case.

www.cmm.bristol.ac.uk/forum

model.matrix() and terms() for R2mlwin objects

model.matrix() and terms() for R2mlwin objects

Re: model.matrix() and terms() for R2mlwin objects

Re: model.matrix() and terms() for R2mlwin objects

Re: model.matrix() and terms() for R2mlwin objects

Re: model.matrix() and terms() for R2mlwin objects