Ying Shan of the Department of Biostatistics defends her dissertation on "Statistical Methods for Genetic Risk Confidence Intervals, Bayesian Disease Risk Prediction, and Estimating Mutation Screen Saturation"
Graduate faculty of the University and all other interested parties are invited to attend.
Genetic information can be used to improve disease risk estimation as well as to estimate the number of genes influencing a trait. Here I explore these issues in three parts. 1) For an informed understanding of a disease risk prediction, the confidence interval of the risk estimate should be taken into account. But few previous studies did. I constructed a better risk prediction model and provided a better screening strategy by taking the confidence interval into account. Risk models are built with varying numbers of genetic risk variants known as single nucleotide polymorphisms (SNPs). Inclusion in the risk model of SNPs, sorted in decreasing order by effect size, with smaller effects modestly shifts the risk but also increases the confidence intervals. The best risk prediction model should not include the small effect SNPs. The newly proposed screening is superior to the traditional screening strategy as evaluated by net benefit quantity. 2) Many methods have been developed for SNP selection, SNP effect estimation, and risk prediction. A Bayesian method designed for continuous phenotypes, BayesR, shows good characteristics. Here, I developed an extension of BayesR (BayesRB), so that the method can be used for binary phenotypes. I evaluated the performance of BayesRB. It performs well on SNP effect estimation and risk prediction, but not on associated SNP selection. 3) Recessive forward genetic screening study (RFGSS) is widely conducted for disease mutation detection. Estimating the screening saturation in a RFGSS guides the screening strategy. Here, I developed a simulation-based "unseen species" method to estimate the screening saturation in a RFGSS. I simulated a RFGSS process based on a real study and compared my method to both non-parametric methods and parametric methods. The proposed method preforms better than all the other methods, except an existing "unseen species" method. The above three newly proposed methods help better construct risk prediction models and estimate the number of disease contributing genes. These methods can be applied to different disease studies and will improve the knowledge of the diseases and make a positive contribution to disease treatment and prevention.