In this dissertation, I propose a novel framework for classifying and describing of multivariate data sets based on the number and structure of indexing variables. Using this framework, I then develop a Bayesian generalized bilinear mixed-effects model for multi-indexed multivariate data. I demonstrate this model is able to capture important features of affiliation network data.
The proposed novel framework, the multi-indexed multivariate data structure, classifies data into different categories based on the number and structure of indexing variables. I focus on two special cases of this type of data: double-indexed multivariate data with multiple-membership structure (termed Type A) and triple-indexed multivariate data with cross-classified structure (termed Type B). Several illustrative examples of data of these two types are provided.
Some existing multivariate statistical methods, which are appropriate for analyzing Type A or Type B data, are reviewed briefly. Methods are roughly classified into two categories according to their underlying purpose. One class of methods has the purpose of identifying patterns of dependence among the components of one variable Y via the indexing variables. This can be achieved by dimension reduction and graphical presentation. The other class of methods quantifying the linear relationship between one variable Y and other variables X, while accounting for the dependence among the components of Y. In both cases, the dependence patterns among the components of Y may be complicated and existing methods may not be suitable. I demonstrate existing methods are not able to capture fourth order dependence, which often is present in Type A and Type B data.
A new statistical methodology, based on the Bayesian generalized bilinear mixed-effects model, is developed for Type A and Type B data. This model can be viewed as a tool for achieving the two desiderata for multivariate statistical methods described above: identifying and accounting for dependence. The model allows us to identify dependence patterns among the components of Y via a bilinear term, an inner product of two latent variables corresponding to two indexing variables. It is also suitable for studying the relationship between Y and X through a regression form while accounting for the dependence among the components of Y caused by repeated measures and/or unexplained fourth order dependence. A Markov chain Monte Carlo algorithm is described for Bayesian inference. Data from the 2012 summer Olympic games is analyzed to illustrate the model.
The performance of the bilinear mixed-effects model-fitting algorithm is studied via the analysis of MCMC output arising from a series of simulation studies. The robustness of the methodology to model misspecification, particularly with respect to over-dispersion and the latent dimension of the bilinear random effects, are examined through these simulation studies.
Affiliation networks are a particular type of Type A data, which record the relationship between a set of 'actors' and a set of 'events'. The generalized bilinear mixed-effects model considers the dependence patterns resulting from interactions between actors and events. In this setting, the model is used to explore patterns in extracurricular activity membership of students in a racially diverse high school in a Midwestern metropolitan area while controlling for differences in participation by both activity characteristics and attributes of the students. Using techniques from spatial point pattern analysis, we show how our model can provide insight into patterns of racial segregation in the voluntary extracurricular activity participation profiles of adolescents. In addition, household travel pat- terns are examined through latent variables associated with the geographic area of residence and the destination area of observed trips. In this case, the bilinear model highlights common travel behaviors of individuals residing inside the I-270 beltway of Columbus, OH.