Contributions to Public Health
- Cellular deconvolution with single-cell references: we developed EnsDeconv, HiDecon, and scMD to estimate cellular fractions from bulk gene expression or DNA methylation using single-cell RNA-seq or DNA methylation references. EnsDeconv (Ensemble Deconvolution) standardizes/synthesizes the pipeline of cellular deconvolution and provides the most accurate results. HiDecon (Hierarchical Deconvolution) estimates rare cell-type fractions with a hierarchical cell-type tree. scMD deconvolves bulk DNA methylation data with ultra-high dimensional and ultra-sparse single-cell DNA methylation references.
- Huang P, Cai M, Lu X, McKennan C, Wang J*. Accurate estimation of rare cell type fractions from tissue omics data via hierarchical deconvolution. Ann. Appl. Stat. 2024 Jun;18(2): 1178-1194. doi: 10.1214/23-AOAS1829. (An earlier version won Research Poster Award, ICSA Applied Statistics Symposium, 2023 & Student Research Award, New England Statistics Symposium, 2023)
- Cai M, Yue M, Chen T, Liu J, Forno E, Lu X, Billiar T, Celedón J, McKennan C, Chen W, Wang J*. Robust and accurate estimation of cellular fraction from tissue omics data via ensemble deconvolution. Bioinformatics. 2022 May 26;38(11):3004-3010. doi: 10.1093/bioinformatics/btac279.
- Estimating sample-level cell type-specific omics: we developed MIND and bMIND, which reliably decompose bulk omics into cell type-specific omics for each sample. They can borrow information across one or multiple measurements of a tissue per individual, such as multiple brain regions, with prior information derived from single-cell data.
- Wang J*, Roeder K*, Devlin B*. Bayesian estimation of cell type-specific gene expression with prior derived from single-cell data. Genome Res. 2021 Oct;31(10):1807-1818. doi: 10.1101/gr.268722.120.
- Wang J, Devlin B, Roeder K. Using multiple measurements of tissue to estimate subject- and cell-type-specific gene expression. Bioinformatics. 2020 Feb 1;36(3):782-788. doi: 10.1093/bioinformatics/btz619.
- Proteomics data: multiplex proteomic quantitation such as isobaric tag-based mass spectrometry speeds up the experiments, but introduces batch effects and nonignorable missing data. We developed mvMISE to simultaneously address the two issues while modeling high-dimensional peptides/proteins with multivariate mixed-effects selection models. We adopted a graphical lasso penalty with an alternating direction method of multipliers (ADMM) algorithm. We analyzed the breast cancer proteomic data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC).
- Wang J, Wang P, Hedeker D, Chen LS. Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness. Biostatistics. 2019 Oct 1;20(4):648-665. doi: 10.1093/biostatistics/kxy022.
- Chen LS, Wang J, Wang X, Wang P. A mixed-effects model for incomplete data from labeling-based quantitative proteomics experiments. Ann Appl Stat. 2017 Mar;11(1):114-138. doi: 10.1214/16-AOAS994.
- Statistical genetics: we developed mixed-effects random forest (MixRF) to impute gene expression in uncollected or inaccessible tissues, e.g., the brain, by borrowing information from other tissue types and incorporating many covariates such as eQTLs. MixRF outperforms existing imputation methods, and incorporating imputed expression data can improve the power to detect differentially expressed genes. We applied MixRF to the Genotype-Tissue Expression (GTEx) data. We also developed ofGEM, a gene-based meta-analysis method for detecting gene-environment interactions.
- Wang J+, Liu Q+, Pierce BL, Huo D, Olopade OI, Ahsan H, Chen LS. A meta-analysis approach with filtering for identifying gene-level gene-environment interactions. Genet Epidemiol. 2018 Jul;42(5):434-446. doi: 10.1002/gepi.22115.
- Wang J, Gamazon ER, Pierce BL, Stranger BE, Im HK, Gibbons RD, Cox NJ, Nicolae DL, Chen LS. Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx. Am J Hum Genet. 2016 Apr 7;98(4):697-708. doi: 10.1016/j.ajhg.2016.02.020.
- Autism: we refined a gene-level Bayesian testing method (TADA) for rare variants in large-scale whole-genome exome sequencing data. We also demonstrated that MIND's estimates could help estimate cell type-specific co-expression networks.
- Chen S+, Wang J+, Cicek E, Roeder K, Yu H, Devlin B. De novo missense variants disrupting protein-protein interactions affect risk for autism through gene co-expression and protein networks in neuronal cell types. Mol Autism. 2020 Oct 8;11(1):76. doi: 10.1186/s13229-020-00386-7.
- Satterstrom FK+, Kosmicki JA+, Wang J+, Breen MS, De Rubeis S, An JY, Peng M, Collins R, Grove J, Klei L, Stevens C, Reichert J, Mulhern MS, Artomov M, Gerges S, Sheppard B, Xu X, Bhaduri A, Norman U, Brand H, Schwartz G, Nguyen R, Guerrero EE, Dias C, Betancur C, Cook EH, Gallagher L, Gill M, Sutcliffe JS, Thurm A, Zwick ME, Børglum AD, State MW, Cicek AE, Talkowski ME, Cutler DJ, Devlin B, Sanders SJ, Roeder K, Daly MJ, Buxbaum JD. Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell. 2020 Feb 6;180(3):568-584.e23. doi: 10.1016/j.cell.2019.12.036.
More information can be found on my lab homepage.
Education
2010 | Renmin University of China, Beijing, China | Bachelor, Statistics
2012 | Renmin University of China, Beijing, China | Master, Statistics
2017 | University of Chicago, Chicago, IL | PhD, Biostatistics
2019 | Carnegie Mellon University, Pittsburgh, PA | Postdoctoral Researcher, Statistics and Data Science
Teaching
BIOST 2154 Statistical Methods for Omics Data; Fall 2024
BIOST 2068: Introduction to Causal Inference; Fall 2021; Fall 2022; Fall 2023
BIOST 2068: Introduction to Causal Inference; Fall 2021; Fall 2022; Fall 2023
BIOST 2025: Biostatistics seminar; Spring, Fall 2020; Spring, Fall 2021; Spring 2022