Phenotype prediction

davidkelley · September 26, 2017, 5:58pm

Hey all,

Unless I’m overlooking it, I don’t currently see methods that emphasize prediction of phenotypes over computing variant associations. For example, Alexander Gusev’s FUSION package (http://gusevlab.org/projects/fusion/) for running transcriptome-wide association studies uses this machinery and includes several well-established strategies such as BLUP and elastic nets. Do you plan to include this class of methods in the future?

Best,
Dave

jbloom · October 5, 2017, 5:27pm

Hi Dave, you’re not overlooking it! Thanks for the pointer, looks like these methods start from GWAS summary statistics and then leverage R packages for regularized regression. We don’t have near term plans to incorporate them ourselves (in part because the urgency on scale is less so with summary stats), but we are working to make big linear algebra more flexible/exposed/performant to make it easier to implement such methods, and to make VDS more generic to handle other data types like tons of functional phenotypes treated as a matrix rather than a table. A simple thing you can do already is linearly predict risk from betas obtained internally or externally by annotating samples with an expression like this:

sa.polyRisk = gs.map(g => g.gt.toDouble.orElse(2 * va.AF) * va.beta).sum()

Another approach may be to munge your data in Hail and then leverage the ML functionality in PySpark, see for example:

davidkelley · October 9, 2017, 7:41pm

Interesting, I’m not familiar with how to use summary statistics to do this. I was thinking of the scenario where you’re working with the genotypes, and you want to predict a phenotype. Since you typically have far more variants than samples, you need to regularize aggressively, so something like an elastic net is a reasonable approach. Maybe a more relevant link would be PrediXcan: https://github.com/hakyim/PrediXcan

Topic		Replies	Views
Using Spark ML to create and apply Random Forests Help [0.1]	2	2470	April 26, 2018
Visualization and analytics frontend Hail Query & hailctl	2	652	June 6, 2020
Use-case for hail at our Institute Science	7	1062	June 20, 2019
Regression with multiple phenotypes with varying degrees of missingness Hail Query & hailctl	5	672	April 27, 2020
GWAS on subset of UKBioBank Hail Query & hailctl	26	1571	July 13, 2021

Phenotype prediction

Related topics