www.cmm.bristol.ac.uk/forum

Posted: **Thu Mar 04, 2010 1:14 pm**

Hi!

I'm reading through the MLwiN Command Manual first, so this may be in the User's Guide (although I took a glance, and didn't see it).

My impression of the data structure for MLwiN is that it's more "spreadsheet"-like than "rectangular data set" -- in that unlike the statistical software that I'm used to (Stata, SPSS), that treats each row (i.e. case, record) as permanently locked together -- MLwiN lets the cells in the data slide up and down.

For example, if you have columns (variables) for Age, Income, Years of Education, and Number of Kids -- you can use the "choose" or "omit" commands to delete some cells, and write the remaining cells to new variables. In which case, you need to be **really** careful that you use variables all from the same stage of selecting for your analyses -- because row #5 (for example) will reflect information about Bob in some columns, and Fred in other columns.

Is this a correct interpretation of the data structure? (I can see flexibility to this arrangement. But for data analyzing purposes, it also seems a little dangerous -- as opposed to having **two** types of column, with the second type behaving more like the "boxes" -- i.e. for storing results -- and the first type of column being dedicated to your survey data.)

Thanks!

Posted: **Thu Mar 04, 2010 6:00 pm**

That's exactly right, and it is indeed a very important point.

I agree that this approach is somewhat dangerous, in that you can end up with completely scrambled data if you are not aware that this is how MLwiN works. I guess I would argue that the flexibility that you mention makes it worthwhile. It's hard to see how you could do things like getting out the level 2 residuals, or making predictions for specified values instead of everyone in the dataset, if you didn't have this approach. (Sure, you could just repeat the level 2 residual on every row that referred to that level 2 unit, but this might not be best if you want to go on and do anything with them- for example it wouldn't be efficient if you wanted to plot them, and it wouldn't be at all the form you needed if you wanted to draw up a league table).

I guess at the end of the day either you agree the flexibility makes the danger worthwhile, or you don't and would probably then rather use another package! Perhaps there needs to be more of an up-front warning though, bearing in mind that many users will be coming from packages that do things differently.

Posted: **Thu Mar 04, 2010 6:43 pm**

Oh, just seen your point about the two types of column. Yes, I guess that would also work. Sometimes though I guess you might want to have different length datasets that you analyse in the same worksheet- perhaps you want to look at a level 1 response variable sometimes and other times look at a level 2 response, collapsing the dataset. Indeed, when you fit multinomial or multivariate models, MLwiN creates a new expanded dataset, without overwriting your original data, so I guess it needs to be able to have actual data columns of different lengths to be able to do that.

Posted: **Sat Mar 06, 2010 9:54 am**

Lydia,

Thanks for the thoughtful replies! (Given the depth of your replies, and that you've replied to many (most?) of the other posts, I'm guessing you're on the CMM staff?) :)

Yep -- I'll just have to keep an eye on this! It may also change my working habits, as my current style is based on using Stata: a stable dataset (except for the additional variables added), that I run various subsamples of just by adding "if sex==1 & race==3 & age>18 & age<=30 etc."

Thanks! :)

(I'm nearly done reading the Command Manual -- feeling pretty comfortable with it. Haven't yet sat down with the actual software, though.)

--Travis

www.cmm.bristol.ac.uk/forum

MLwiN data structure?

MLwiN data structure?

Re: MLwiN data structure?

Re: MLwiN data structure?

Re: MLwiN data structure?