Fast linear regression for multiple phenotypes, eQTLs

Yes, in Hail 0.2, the multiple phenotype case has been incorporated into the main linear regression method (note that y can be a single element or a list):
https://hail.is/docs/devel/methods/stats.html#hail.methods.linear_regression

1 Like

Thank you! Also, does the x variable have to be GT?

x=dataset.GT.n_alt_alleles()

I may misunderstand your questions, but for diploid genotypes, dataset.GT.n_alt_alleles() evaluates to 0, 1, 2, or missing (the number of non-reference alleles in the call). See n_alt_alleles.

Sorry I meant does GT have to be the x variable for linear regression

Linear regression is totally flexible – the x variable can be any numeric expression. You can do regression on missingness of a field with hl.is_missing(dataset.GT), or the depth, or GQ, or anything you want!

1 Like