This week's Biostatistics Seminar speaker, Dr. Shyamal D. Peddada of Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences (NIEHS, NIH), will present, " Some challenges in the analysis of microbiome data: Old wine in a new bottle with a twist!"
Over the past couple of decades researchers have been interested in studying the genes by (external) environment interaction on human health. However, lately there is considerable interest to study the role of internal microbial environment on human health. Numerous studies are being routinely conducted to understand the association between microbiome and various health outcomes. The 16S rRNA data generated from such studies are high dimensional count data containing large number of zeros. Using these microbial count data, researchers are often interested in problems such as comparing various experimental groups and classification of subjects into groups (e.g. healthy and sick). Many “off the shelf” methods, that ignore the underlying structure in these data, are not appropriate for analyzing the microbiome data. They are potentially subject to inflated false discovery rate (FDR) and/or loss of power. Consequently, such methods; (a) may result in wasted resources in following up “leads” that cannot be replicated because they are false, (b) may result in missing important findings that should have been discovered, and most importantly (c) misinterpretation of the underlying biology. In this talk we describe some recent methodological developments in this this field which are based on some “old ideas” that account for the underlying structure in these data. These methods are illustrated using (a) an infant gut microbiome data obtained from Norwegian infant gut study, and (b) the global gut data of Yatsunenko (Nature, 2012).