Multiple trait GWAS?


After reading other discussions, I have found out that the regression functionality is suitable for any general regression regardless of the actual outcome/exposure (not only genetic variants).

My current analysis plan is to run a linear regression with the same set of exposure and covariates but with (many many) different outcomes. I’m actually trying to run a regression with all common variants as outcomes.

I tried to supply ‘mt.GT.n_alt_alleles()’ in the y variable but it didn’t work. After looking at the documentation, it seems that the regression function currently gets only 1d arrays as an input (Hail | Statistics).

Is there a solution? I think people who wants to run a PheWAS also require having multiple outcomes option when running a regression.

The linear_regression_rows and logistic_regression_rows methods support lists of outcome variables for y. However, using mt.GT.n_alt_alleles as an outcome variable doesn’t work because we restrict outcome variables to fields of a matrix table indexed by column, not entry (“sample” fields, not “genotype” fields) in order to apply some algorithmic tricks to improve performance. However, you might be interested in the general purpose linreg aggregator which doesn’t have the restrictions of linear_regression_rows (or its performance, be warned). Using this might look like:

mt = mt.annotate_rows(linreg_results = hl.agg.linreg(y=mt.GT.n_alt_alleles(), x=[...x and covs and intercept...]))