My methodological research focuses on (1) semiparametric theory and methods for complex time-to-event data, (2) subgroup identification/inference and individualized treatment effect estimation in precision medicine and public health, (3) multi-omics data analysis and integration, and (4) deep learning and causal inference. My collaborative research has a broad scope. I first established my collaboration in cancer studies. Later, I have been actively working on neuropsychiatric disorders and aging-related research (e.g., Age-related macular degeneration and Alzheimer’s disease (AD)) using large-scale genetic and multi-omics data. Currently, I am also working on pediatric pulmonary diseases such as childhood asthma.
Ph.D. (2010) Department of Biostatistics, University of Michigan, MI
M.A. (2005) Department of Mathematics, Indiana University Bloomington, IN
B.S. (2003) Department of Mathematics, Nanjing University, China
Applied Mixed Models BIOST2086 Spring 2013, Spring 2014, Spring 2016, 2017
Biostatistics Seminar BIOST2025 Spring 2014, Fall 2014
- 2021 James L. Craig Excellence in Education Award
- 2022 Inducted to Delta Omega Honor Society in Public Health
- 2022 Ascending Star Award, Heath Sciences, University of Pittsburgh
- 2023 American Statistical Association LiDS Section Outstanding Service Award
Funding (PI Grant)
- Funding Agency: NIH/NIGMS
Grant Number: R01GM141076
Grant Title: New statistical methods and software for modeling complex multivariate survival data with large-scale covariates
Role on Grant: Principal Investigator
Years Inclusive: 6/1/2022 – 5/31/2026
Total Direct Costs: $800,000 - Funding Agency: Pitt CTSI
Grant Title: Precision Care in asthma using EHR analytics
Role on Grant: MPI
Years Inclusive: 6/1/2022 – 5/31/2023
Total Direct Costs: $45,000 - Funding Agency: NIH/NEI
Grant Number: R21EY030488
Grant Title: Deep-learning-based prediction of AMD and its progression with GWAS and fundus image data
Role on Grant: MPI (contact PI)
Years Inclusive: 8/1/2020 – 5/31/2023 (with 1-year NCE)
Total Direct Costs: $270,000 - Funding Agency: NIH/Clinical and Translational Science Institute, University of Pittsburgh
Grant Title: Deep Learning with GWAS to Predict AMD Progression
Role on Grant: Principal Investigator
Years Inclusive: 1/1/2019 – 12/31/2019
Total Direct Costs: $10,000 - Funding Agency: NIH/NIMH
Grant Number: R03MH108849
Grant Title: Novel and Robust Methods for Differential Protein Network Analysis of Proteomics Data in Schizophrenia Research
Role on Grant: Principal Investigator
Years Inclusive: 7/1/2016 – 6/30/2018
Total Direct Costs: $100,000 - Funding Agency: UPMC
Grant Title: Competitive Medical Research Fund
Role on Grant: Principal Investigator
Years Inclusive: 7/1/2015 - 12/31/2017
Total Direct Costs: $25,000
*: corresponding/senior author; +: co-first author; _: PhD student advisee
2024:
-
Hu H, Wang X, Feng S, Xu Z, Liu J, Heidrich-O’Hare E, Chen Y, Yue M, Zeng L, Ding Y, Huang H, Duerr R, Chen W. (2024). A unified model-based framework for doublet/multiplet detection in single-cell multiomics data. Nature Communications 15, 5562. https://doi.org/10.1038/s41467-024-49448-x
-
Liu J, Bo N, Forno E, Ding Y*. (2024). Predicting Pediatric Asthma Severe Outcomes using Machine Learning Methods for EHR Data with Repeated Clinic Visits. Journal of Statistical Research. 58(1): 131-149. https://doi.org/10.3329/jsr.v58i1.75419
-
Bo N+, Wei Y+, Zeng L, Kang C, Ding Y*. (2024). A Meta-Learner Framework to Estimate Individualized Treatment Effects for Survival Outcomes (An earlier version won the 2022 JSM LiDS section student paper award). Journal of Data Science. https://doi.org/10.6339/24-JDS1119
-
Chen L, Wang Y, Cai C, Ding Y, Kim RS, Lipchik C, Fumagalli D, Gavin PG, Yothers G, Allegra CJ, Petrelli NJ, Suga JM, Hopkins JO, Saito NG, Evans T, Jujjavarapu S, Wolmark N, Lucas PC, O’Connell MJ, Paik S, Sun M, Pogue-Geile KL, Lu X. (2024). Machine Learning Predicts Oxaliplatin Benefit in Colon Cancer Adjuvant Therapies. Journal of Clinical Oncology. PMID: 38315963. DOI: 10.1200/JCO.23.01080
2023:
-
Sun T, Lang W, Zhang G, Yi D, Ding Y, Zhang L. (2023). Penalised semiparametric copula method for semi-competing risks data: Application to hip fracture in elderly. Journal of the Royal Statistical Society Series C. 73(1): 241-256 https://doi.org/10.1093/jrsssc/qlad093 PMID: 37065470
-
Zhou X, Zhang J, Ding Y, Li Y, Huang H, Chen W. (2023) Predicting Late-Stage Age-Related Macular Degeneration by Integrating Marginally Weak SNPs in GWA Studies. Frontiers in Genetics. https://doi.org/10.3389/fgene.2023.1075824 PMID: 37065470
-
Sun T, Li Y, Xiao Z, Ding Y, Wang X. (2023) Semiparametric copula method for semi-competing risks data subject to interval censoring and left truncation: Application to disability in elderly. Statistical Methods in Medical Research. https://doi.org/10.1177/09622802221133552 PMID: 36735020
2022:
- Sun T, Cheng Y, Ding Y*. (2022) An Information Ratio based Goodness-of-fit Test for Copula Models on Censored Data. Biometrics. https://doi.org/10.1111/biom.13807
- Ding Y*, Sun T. Copula Models and Diagnostics for Multivariate Interval-Censored Data. In: Sun J, Chen D-G, editors. Emerging Topics in Modeling Interval-Censored Survival Data p141–165 New York: Springer, 2022.
- Wang X, Xu Z, Zhou X, Zhang Y, Huang H, Ding Y, Duerr RH, Chen W. (2022) SECANT: a biology-guided semi-supervised method for clustering, classification, and annotation of single-cell multi-omics. PNAS Nexus. https://doi.org/10.1101/2020.11.06.371849
- Sun T, Ding Y. (2022) Neural Network on Interval Censored Data with Application to the Prediction of Alzheimer’s Disease. Biometrics. https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13734
- Ganjdanesh A+, Zhang Z+, Chew EY, Ding Y, Chen W*, Huang H* (2022) LONGL-Net: A Temporal Correlation Structure Guided Deep Learning Framework for Predicting Longitudinal Age-related Macular Degeneration Severity. PNAS Nexus. PMID: 35360552 DOI: 10.1093/pnasnexus/pgab003
2021:
- Wei Y, Hsu JC, Chen W, Chew EY, Ding Y*. (2021) Identification and Inference for Subgroups with Differential Treatment Efficacy from Randomized Controlled Trials with Survival Outcomes through Multiple Testing. (The earlier version won the Best Poster Award in ASA Pittsburgh Chapter 2019 Meeting.)Statistics in Medicine. PMID: 34542190 DOI: 10.1002/sim.9196
- Wei Y, Wang X, Chew EY, Ding Y*. (2021) Confident Identification of Subgroups from SNP Testing in RCTs with Binary Outcomes. Biometrical Journal. https://doi.org/10.1002/bimj.202000170
- Yan Q, Jiang Y, Huang H, Xin H, Swaroop A, Chew EY, Weeks DE, Chen W*,Ding Y*. (2021) GWAS-based Machine Learning for Prediction of Age-Related Macular Degeneration Risk. Translational Vision Science & Technology (TVST). https://doi.org/10.1167/tvst.10.2.29
- Cui X, Dickhaus T, Ding Y, Hsu JC. Handbook of Multiple Comparisons. Chapman & Hall/CRC,2021 ISBN 9780367140670
- Ding Y*,Wei Y, Wang X, Hsu JC. Testing SNPs in Targeted Drug Development. Book Chapter In: Cui X, Dickhaus T, Ding Y, Hsu JC. Handbook of Multiple Comparisons. Chapman & Hall/CRC, 2021
2020:
- Sun T, Wei Y, Chen W, Ding Y*. (2020) Genome-wide Association Study-based Deep Learning for Survival Prediction. (The earlier version won the 2019 LiDS Conference Student Poster Award.) Statistics in Medicine.https://doi.org/10.1002/sim.8743.
- Chen L-W, Cheng Y, Ding Y, Li R. (2020) Quantile Association Regression on Bivariate Survival Data. Canadian Journal of Statistics. doi/10.1002/cjs.11577.
- Wang X+, Sun Z+, Zhang Y, Xu Z, Huang H, Duerr R, Chen K, Ding Y*, Chen W*. (2020) BREM-SC: A Bayesian Random Effects Mixture Model for Joint Clustering Single Cell Multi-omics Data. (The paper won the 2020 ICSA Student Paper Award.) Nucleic Acid Research.48(11): 5814–5824 doi: 10.1093/nar/gkaa314. PMID: 32379315.
- Sun T, Ding Y*. (2020) CopulaCenR: Copula based Regression Models for Bivariate Censored Data in R. The R Journal. https://doi.org/10.32614/RJ-2020-025.
- Yan Q, Weeks DE, Xin H, Huang H, Swaroop A, Chew EY, Ding Y*, Chen W*. (2020) Deep-learning-based Prediction of Late Age-Related Macular Degeneration Progression. Nature Machine Intelligence.2(2):141-150 DOI: 10.1038/s42256-020-0154-9 PMID: 32285025.
- Ding Y*,Wei Y, Wang X. Logical Inference on Treatment Efficacy When Subgroups Exist. Book Chapter In: Ting N, Cappelleri JC, Ho S, Chen DG. Design and Analysis of Subgroups with Biopharmaceutical Applications. New York: Springer, 2020.
2019:
- Wei Y+, Liu Y+, Sun T, Chen W, Ding Y*. (2019) Gene-based Association Analysis for Bivariate Time-to-event Data through Functional Regression with Copula Models. (The earlier version won the 2019 LiDS Conference Student Paper Award.) Biometrics.DOI:10.1111/biom.13165
- Sun T, Ding Y*. (2019) Copula-based semiparametric transformation model for bivariate data under general interval censoring. (The earlier version won the 2019 ENAR Distinguished Student Paper Award.) Biostatistics. DOI: 10.1093/biostatistics/kxz032
- Sun Z, Chen L, Xin H, Huang Q, Cillo AR, Tabib T, Kolls JK, Bruno TC, Lafyatis R, Vignali DAA, Chen K, Ding Y*, Hu M*, Chen W*. (2019) BAMM-SC: A Bayesian mixture model for clustering droplet-based single cell transcriptomic data from population studies. (The earlier version won the 2019 ENAR Distinguished Student Paper Award.) Nature Communication. 10(1):1649 Doi: 10.1038/s41467-019-09639-3. PMID: 30967541
- Sun T+, Liu Y+, Cook RJ, Chen W, Ding Y*. (2019). Copula-based Score Test for Bivariate Time-to-event Data, with Application to a Genetic Study of AMD Progression. (The earlier version won the Best Poster Award in ASA Pittsburgh Chapter 2017 Meeting.) Lifetime Data Analysis. DOI: 10.1007/s10985-018-09459-5. PMID: 30560439
- Lin HM, Xu H, Ding Y, Hsu JC. (2019). Correct and Logical Inference on Efficacy in Subgroups and Their Mixture for Binary Outcomes. Biometrical Journal. 61(2): 8-26. PMID: 30353566
2018:
- Ding Y*, Li GY, Liu Y, Ruberg SJ, Hsu JC. (2018). Confident Inference For SNP Effects On Treatment Efficacy. Annals of Applied Statistics. 12(3): 1727-1748.
- Ding Y*,+, Kong S+, Kang S, Chen W. (2018). A Semiparametric Imputation Approach for Regression with Censored Covariate, with Application to an AMD Progression Study. Statistics in Medicine. 37: 3293–3308. PMID: 29845616
- Yan Q+, Ding Y+, Liu Y, Sun T, Fritsche LG, Clemons T, Ratnapriya R, Klein ML, Cook RJ, Liu Y, Fan R, Wei L, Abecasis GR, Swaroop A, Chew EY, AREDS2 research group, Weeks DE, Chen W. (2018). Genome-wide Analysis of Disease Progression in Age-related Macular Degeneration. Human Molecular Genetics. 27(5):929-940. PMID: 29346644
- Sun Z, Wang T, Deng K, Wang X-F, Lafyatis R, Ding Y, Hu M, Chen W. (2018). DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data. Bioinformatics. 34(1): 139-146. PMID: 29036318
2017 and before:
- Ding Y, Liu Y, Yan Q, Fritsche LG, Cook RJ, Clemons T, Ratnapriya R, Klein ML, Abecasis GR, Swaroop A, Chew EY, Weeks DE, Chen W. (2017). Bivariate Analysis of Age-Related Macular Degeneration Progression Using Genetic Risk Scores. Genetics. 206(1):119-133. PMID: 28341650
- Wang T, Ren Z, Ding Y, Zhou F, Sun Z, MacDonald ML, Sweet RA, Chen W. (2016). FastGGM: An efficient algorithm for the inference of Gaussian graphical model in biological networks. PLoS Computational Biology. 12(2): e1004755. PMID: 26872036
- Fan R, Wang Y, Yan Q, Ding Y, Weeks DE, Lu Z, Ren H, Cook R J, Xiong M, Swaroop A, Chew E Y, and Chen W. (2016). Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions. Genetic Epidemiology. 40(2): 133-43. PMID: 26782979
- Ding Y*, Lin HM, Hsu JC. (2016). Subgroup Mixable Inference on Treatment Efficacy in Mixture Populations, with an Application to Time-to-Event Outcomes. Statistics in Medicine. 35(10):1580-94. PMID: 26646305
- Ding Y*, Nan B. (2015). Estimating Mean Survival Time: When is it Possible? Scandinavian Journal of Statistics 42(2):397-413. PMID: 26019387 PMCID: PMC4442028
- Shen L, Ding Y, Battioui C. A Framework of Statistical Methods for Identification of Subgroups with Differential Treatment Effects in Randomized Trials. (2015) In: Chen Z, Liu A, Qu Y, Tang L, Ting N & Tsong Y, eds. Applied Statistics in Biomedicine and Clinical Trials Design: Selected Papers from 2013 ICSA/ISBS Joint Statistical Meetings. New York: Springer.
- Ding Y, Fu H. (2013). Bayesian Indirect and Mixed Treatment Comparisons Across Longitudinal Time Points. Statistics in Medicine 32 (15):2613-28. PMID: 23229717
- Banerjee M, Ding Y, Noone A. (2012). Identifying Representative Trees from Ensembles. Statistics in Medicine 31(15):1601-16. PMID: 22302520
- Ding Y, Nan B. (2011). A Sieve M-theorem for Bundled Parameters in Semiparametric Models, with Application to the Efficient Estimation in a Linear Model for Censored Data. Annals of Statistics 39(6): 3032-3061. PMID: 24436500 PMCID: PMC3890689
- Ding Y, Choi H, Nesvizhskii AI. (2008). Adaptive Discriminant Function Analysis and Reranking of MS/MS Database Search Results for Improved Peptide Identification in Shotgun Proteomics. Journal of Proteome Research 7(11): 4878-89. PMID: 18788775 PMCID: PMC3744223
- Complete List of Published Work in My Bibliography