LinearMixedModel

Now that I have my basic GWAS workflow sorted out I was looking more into the LinearMixedModel functionality. I know the documentation mentions this functionality is experimental.

I was wondering if other users have gotten a LinearMixedModel to work, and if not, if this is something dev’s are looking into? Would be great to have functionality so I can more directly compare to Bolt-LMM, say if a collaborator is using LMM and we want to do a meta-analysis!

Right now I try running:

lmm, _ = hl.linear_mixed_model(
    y=mt.pheno.trait,
    x=[1.0, mt.pheno.cov1, mt.pheno.cov2, mt.pheno.cov3],
    z_t=mt.GT.n_alt_alleles(), 
    p_path=bucket+p_fname)

which just seems to keep running without an end in sight. Have tried running it for 4+ hours.

Any tips? Or is this functionality not a current priority in Hail’s feature set?

Thank you for your question.

Perhaps to get us moving along, I wonder what is the size of your dataset? I am assuming that trait is a binary trait? And have you tried with 1 covariate at a time and noticed the time shorten?

As for whether this is a priority or not, I am tagging @johnc1231 who is working on linear algebra applications in Hail. He may be able to shed more light on this too!

Linear Mixed Model should work, but it’s not widely used / not very performant. People have mostly been using SAIGE to run their LMMs from what I understand. At some point we will revisit this functionality, but currently it’s not on our near term roadmap.

@kumarveerapen the dataset is a large imputed genotype dataset with roughly 30k samples by 6mil variants. I have not tried 1 covariate at a time but will do so.

@johnc1231 indeed many of my colleagues use SAIGE or Bolt-LMM. I think having this functionality being performant in Hail would be a major benefit for those looking to pivot their whole workflow to Hail, but I understand there may be many priorities that may need to be addressed first. Thanks for your response!