Background: As machine learning (ML) technologies make improving strides, they’ve demonstrated advantageous novelties in the understanding and prediction of Alzheimer’s Disease (AD). Recent efforts in AD research have aimed to capitalize ML for AD genomics. Whole-genome tiling (WGT) establishes a new representation of whole-genome sequencing (WGS) data, one that has been brought forth in support of ML and precision medicine.
Method: A comprehensive description, workflow, and publicly available WGT data can be found at http://curover.se/su92l-j7d0g-swtofxa2rct8495. In this analysis, we first performed quality control, imputation, and one-hot encoding to each tile and the respective tile variants (Figure 1). Then, using an independent GWAS summary, we mapped the top ten most significant single-nucleotide polymorphisms (SNPs) associated with AD, to their respective tiles. Using these tile data, we performed AD classification with an XGBoost algorithm. We then reiterated this process including covariates (age, sex, and ethnicity) with tiling variants. A total of 1545, subjects (474 case, 1071 control) were studied.
Result: After mapping our 10 most significant SNPs in our independent GWAS study to WGT data. We ended up with 10 different tiles where those 10 tiles encapsulated 356 SNPs. In our comparative analysis we make AD diagnosis predictions using either our 10 tiles of WGT data, our 356 SNPs in WGS data, covariate data (age, sex, and ethnicity), or a combination of tile variants and covariates or SNPs and covariates. Our results (Figure 2) show that WGT performs comparatively to that of SNPs, meaning the patterns embedded within individual SNPs are maintained within WGT data.
Conclusion: Our pilot investigation on the use of WGT represented genomic data within the context of ML and AD prediction has demonstrated satisfying results. These results indicate that despite the differing information encapsulated within WGT data compared to WGS data, we are able to make comparable results in the prediction of AD diagnosis. Our study on WGT and its use in AD machine learning shows the potentiality and novelty of WGT but probe for further exploration of its maximal use.
Matthew Lee, Brian Lee, Sarah Zaranek, Jingxuan Bao, Shu Yang, Sang-Hyuk Jung, Heng Huang, Andrew Saykin, Paul Thompson, Christos Davatzikos, Dokyoon Kim, Alexander Zaranek, Li Shen (2022). "Machine learning for Alzheimer’s disease classification from genomic tiling variants." Alzheimer’s Association International Conference 2022.