Contributions to Public Health
- To understand the genetic basis of phenotypic variability, most existing methods for eQTL mapping and GWAS examined a single phenotype and a single genetic variant at a time. Instead, taking systems-genetics approach, my lab developed statistical machine learning methods for learning a gene network and eQTLs that perturb this network simultaneously, given eQTL datasets, and further extended this to integrate multiple omics data with a highly-scalable learning algorithm.
- Zhang L and Kim S. Learning gene networks under SNP perturbations using eQTL datasets. PLoS Comput Biol., 10(2):e1003420, 2014. PubMed PMID: 24586125; PubMed Central PMCID: PMC3937098.
- McCarter C, Howrylak J, and Kim S. Learning gene networks underlying clinical phenotypes using SNP Perturbation. PLoS Computational Biology, 16(10):e1007940, 2020.
- Allele-specific expression is essential for understanding gene regulation in diploid organisms. My lab developed computational tools for highly accurate and efficient allele-specific expression quantification via a light-weight extension of the popular software kallisto, and for learning gene regulatory networks and cis-acting/trans-acting eQTLs from eQTL data with allele-specific expression.
- Adduri A. and Kim S. Ornaments for efficient allele-specific expression estimation with bias correction. The American Journal of Human Genetics (accepted).
- Yoon J and Kim S. Learning gene networks under SNP perturbation using SNP and allele-specific expression data, BioRxiv 563661 [Preprint]. Oct 24, 2023. Available from: https://doi.org/10.1101/2023.10.23.563661.
- For longitudinal measurements of high-dimensional correlated features such as spatio-temporal data for COVID case counts, it is essential to model both the structure over the multivariate features and their evolution over time jointly with a highly scalable learning algorithm. My lab developed flexible nonparametric Bayesian methods based on Gaussian processes for multivariate-output regression with time as covariates.
- Yoon JH, Jeong D, and Kim S. Doubly mixed-effects Gaussian process regression. In Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS), pages 6893–6908. PMLR, 2022.
- Jeong D and Kim S. Factorial stochastic differential equation for multi-output Gaussian process regression. In Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS), pages 9755–9772. PMLR, 2023.
- In many real-world data, complex dependencies are present both among samples and among features, such as expression measurements of tens of thousands of genes collected for individuals in a pedigree. My lab developed statistical methods for jointly modeling dependencies among samples and features.
- Yoon JH and Kim S. EiGLasso for scalable sparse Kronecker-sum inverse covariance estimation. Journal of Machine Learning Research, 23(110):1–39, 2022.
Education
2001 Seoul National University, Seoul, Korea, B.S. in Computer Engineering
2007 University of California, Irvine, Irvine CA, USA, PhD in Computer Science
2010 Carnegie Mellon University, Pittsburgh PA USA, Postdoctoral Fellow in Computer Science and Machine Learning