Zhiguang Huo of the Department of Biostatistics defends his dissertation on "Statistical Integrative Omics Methods for Disease Subtype Discovery"
Graduate faculty of the University and all other interested parties are invited to attend.
Disease phenotyping by omics data has become a popular approach that potentially can lead to better-personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step towards this goal. With the accumulation of massive high-throughput omics data sets, omics data integration is essential to improve statistical power and reproducibility. In this thesis, two extensions from sparse K-means method will be introduced. The first extension is towards a meta-analytic framework to identify novel disease subtypes when expression profiles of multiple cohorts are available. The lasso regularization and meta-analysis identify a unique set of gene features for subtype characterization. An additional pattern matching reward function guarantees consistent subtype signatures across studies. The second extension is towards integrating multi-level omics datasets with the guidance of prior biological knowledge via sparse overlapping group lasso. An algorithm using alternating direction method of multiplier (ADMM) will be applied for fast optimization. For both topics, simulation and real applications in breast cancer and leukemia will show the superior clustering accuracy, feature selection and functional annotation. These methods will improve statistical power, prediction accuracy and reproducibility of disease subtype discovery analysis.