Does Hail's linear regression method account for additivity or dominance?

Hi folks,

I’m pretty new to statistical genetics, and I’m wondering whether Hail’s linear regression outputs separate beta values for additivity/dominance.

When I learned about GWAS, I was taught to define random variables (X_a, X_d) = (-1, -1), (0, 1), and (1, -1) corresponding to genotypes homozygous ref, heterozygous, and homozygous alt, respectively. This would then give you two betas, beta_a and beta_d, whose values tell you whether the trait is additive, dominant, neither, or some mixture.

My guess from the documentation is that it only outputs a single beta corresponding to the genotype (I’m running it the same way as in the documentation—feeding in GT.num_alt_alleles()).

If this is the case, could Hail’s method miss some associations because a trait isn’t strictly linear?


That’s right, Hail only runs one test per phenotype per variant. If you’re using mt.GT.n_alt_alleles you’re testing the additive model. Regarding the -1,0,1 model vs the 0,1,2 model, I think you should be able to convert between the betas from these two representations but I don’t know the math off the top of my head.

You can run a dominance model by manually producing a dominance encoding which is orthogonal to the 0,1,2 model.

Alternatively, you could just write write the dominance and additive models using your representation:

mt = mt.annotate_entries(
    mt_add = mt.GT.n_alt_alleles() - 1, # assuming biallelic
    mt_dom = 1 - 2 * mt.GT.is_het()

NB: When is_het you get 1 when !is_het you get -1.