Phenotype prediction

Hi Dave, you’re not overlooking it! Thanks for the pointer, looks like these methods start from GWAS summary statistics and then leverage R packages for regularized regression. We don’t have near term plans to incorporate them ourselves (in part because the urgency on scale is less so with summary stats), but we are working to make big linear algebra more flexible/exposed/performant to make it easier to implement such methods, and to make VDS more generic to handle other data types like tons of functional phenotypes treated as a matrix rather than a table. A simple thing you can do already is linearly predict risk from betas obtained internally or externally by annotating samples with an expression like this:

sa.polyRisk = gs.map(g => g.gt.toDouble.orElse(2 * va.AF) * va.beta).sum()

Another approach may be to munge your data in Hail and then leverage the ML functionality in PySpark, see for example: