Statistical Matching for Data Fusion of Complex Phenotypes
The statistical matching problem involves data fusion with structured missing data. In a canonical version of the problem, there are two datasets A and B, and three sets of variables X, Y, and Z. Dataset A contains observations on the (X, Y) variables, and dataset B contains observations on the (X, Z) variables. A common goal in the statistical matching problem is to impute the missing values in each dataset in order to synthesize a dataset with all the variables (X, Y, Z). We use model-based statistical matching for fusion of complex phenotypes by extending to high-dimensional non-Gaussian data.