If I got things right, in regressions the effect allele, for which beta is calculated, is the “alternate allele”, but which allele is considered the alternate allele in Hail?
Is it the second allele (as in ds.alleles[1]) regardless of the allele frequency, or is it the minor allele as calculated in variant_qc.AF (which can be allele 1 or allele 2)?
In the example above, are the effect alleles A and G, respectively, or A and A.
Related to that question, how would I create columns called EA and NEA (Effect Allele and Non-Effect Allele) I could then use to match variants for meta-analysis?
@tpoterba, this is the natural way to flip the encoding:
hl.linear_regression_rows(y=mt.pheno, x = 2 - mt.GT.n_alt_alleles())
Also, users, don’t forget to include the intercept if you want one!
hl.linear_regression_rows(y=mt.pheno, x = 2 - mt.GT.n_alt_alleles(), covariates=[1])
Without intercept, negating y flips the sign on the effect size on x, but negating and shifting y has stranger influence. With intercept, negating (and possibly shifting y) flips the sign on the effect size of x.