Page 1 of 1

multivariate binary model

Posted: Mon Dec 14, 2015 4:34 pm
by VickiTate
Hi,

I'm currently trying to run a multivariate binary response model with missing data, but I keep getting an error message.

I have simulated a dataset to reflect a simple clinical trial. I have two binary outcomes (X1 and X2) and a variable to reflect if the participants received the treatment or not (arm). One of the binary outcomes (X1) has missing data, which I have coded as "-9.999e+29".

I'm using R2MLwiN to run my model, using the following code:

runMLwiN( c(probit(X1, cons), probit(X2, cons)) ~ 1 + arm , D=c("Mixed", "Binomial", "Binomial"), data = dataSim, estoptions = list( nonlinear = c(N = 0, M=1)))

The model appears to be temperamental! Occasionally, it runs and provides reasonable results. However, the majority of the time the following error message is displayed:

"Error in runMLwiN(c(probit(X1, cons), probit(X2, cons)) ~ 1 + arm, D = c("Mixed", :
All values for a binomial response must lie between zero and one"

Do you have any ideas why I get this error message some of the time?
Many thanks in advance.

Re: multivariate binary model

Posted: Mon Dec 14, 2015 6:33 pm
by ChrisCharlton
When calling MLwiN from within R you need to use the R missing value (NA) instead of the MLwiN code (-9.999e+29). The code will be converted as part of the process of transferring the data between packages. I suspect that your missing data is therefore being misinterpreted, which might explain the odd behaviour that you are seeing.

Re: multivariate binary model

Posted: Tue Dec 15, 2015 10:50 am
by VickiTate
Thanks for the quick reply.

Unfortunately, when I change the missing values to "NA", I still get the following error message (about a third of the time):

Error in if (!all(indata[[resp]] >= 0 && indata[[resp]] <= :
missing value where TRUE/FALSE needed

Re: multivariate binary model

Posted: Tue Dec 15, 2015 11:40 am
by ChrisCharlton
This appears to be a bug in the code that checks that the responses have valid values, as demonstrated with the example below:

Code: Select all

> require(R2MLwiN)
> 
> # Load sample dataset
> data(tutorial)
> 
> # Create binary indicators
> tutorial$binexam <- tutorial$normexam > 0
> tutorial$binlrt <- tutorial$standlrt > 0
> 
> # Run model
> runMLwiN( c(probit(binexam, cons), probit(binlrt, cons)) ~ 1, D=c("Mixed", "Binomial", "Binomial"), data = tutorial)
/nogui option ignored
ECHO    0


Echoing is ON
BATC 1

Batch mode is ON
MAXI 2
STAR
iteration 0
iteration 1

Convergence not achieved
TOLE 2
MAXI 20
NEXT
iteration 2
iteration 3

Convergence achieved
ECHO 0
Execution completed


-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
MLwiN (version: 2.35)  multilevel model (Mixed) 
Estimation algorithm:  IGLS MQL1        Elapsed time : 1.62s 
Number of obs:  4059 (from total 4059)        The model converged after 4 iterations.
Log likelihood:      NA 
Deviance statistic:  NA 
--------------------------------------------------------------------------------------------------- 
The model formula:
c(probit(binexam, cons), probit(binlrt, cons)) ~ 1
Level 1: l1id      
--------------------------------------------------------------------------------------------------- 
The fixed part estimates:  
                      Coef.   Std. Err.      z   Pr(>|z|)         [95% Conf.   Interval] 
Intercept_binexam   0.03057     0.01968   1.55     0.1202           -0.00799     0.06914 
Intercept_binlrt    0.03799     0.01968   1.93    0.05354   .       -0.00058     0.07656 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  
--------------------------------------------------------------------------------------------------- 
The random part estimates at the l1id level: 
                        Coef.   Std. Err. 
var_bcons_1           1.00000     0.00000 
cov_bcons_1_bcons_2   0.41914     0.01192 
var_bcons_2           1.00000     0.00000 
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*- 
> 
> # Make ~20% of each response missing
> tutorial$binexam[runif(4059) > 0.2] <- NA
> tutorial$binlrt[runif(4059) > 0.2] <- NA
> 
> # Run model again
> runMLwiN( c(probit(binexam, cons), probit(binlrt, cons)) ~ 1, D=c("Mixed", "Binomial", "Binomial"), data = tutorial)
Error in if (!all(indata[[resp[i - 1]]] >= 0 && indata[[resp[i - 1]]] <=  : 
  missing value where TRUE/FALSE needed
This should now be fixed in the development version, so you could try that. Otherwise as IGLS ignores rows with missing values the alternative would be to remove any rows containing missing values prior to running the model.

Re: multivariate binary model

Posted: Thu Feb 25, 2016 4:00 pm
by VickiTate
Thanks for providing a fix to this problem. However, I was just wondering why the model uses listwise deletion?

I thought one of the benefits of using this model was that it could predict the missing values using the correlations between the outcomes? I believe this is what the model does when all the outcomes are continuous.

Thanks.

Re: multivariate binary model

Posted: Fri Feb 26, 2016 10:44 am
by ChrisCharlton
You are correct that listwise deleting in R will lose information as I had forgotten that MLwiN excludes the rows with missing after the data has been expanded to a longer format with each response on its own row, meaning that only the responses with missing values would be excluded. The following is a quote from the MLwiN MCMC guide (page 278) that describes how missing data is handled in multivariate models when using the IGLS and MCMC estimation methods:
In MLwiN the IGLS and MCMC methods fit this model using different approaches which are both equivalent to assuming a ‘missing at random’ or MAR assumption (Rubin, 1976). The IGLS method, due to the clever trick of treating the multivariate problem as a special case of a univariate problem, can simply remove the missing rows in the longer data vector, which contains one row for each response. (See the User’s Guide to MLwiN for more details).

The MCMC method considers the missing data as additional parameters in the model and assumes an independent uniform prior for each missing response. Then the missing records have Normal posterior distributions (multivariate Normal for individuals with more than one missing response) and the Gibbs sampling algorithm is extended to include an additional step that generates values for the missing data values at each iteration in the sampling