I have been happily running linear regressions on entry data as such:
mt = mt.annotate_rows(GWAS= hl.agg.linreg( mt.pheno, [1.0, mt.entry1, mt.entry2, covariates]))
I now have several binary traits and am wondering if there’s a comparable way to run logistic regression on entries rather than rows?
Thanks so much!
Just checking in again - any advice for handling this? Thanks!
Ack, sorry, I thought I’d responded. Unfortunately logistic regression doesn’t have a one-pass algorithm, so it doesn’t fit as naturally into the aggregator system. We’ll have a way to do this in the next 6-9 months, I think, but before then the rather inflexible interface of logistic_regression_rows is the only option.
ok good to know. Thanks, and keep please me in the loop when this is added! Would be great to package this feature into the Tractor framework for case/control phenotypes.
Hey again! I am hoping to run this pipeline for a consortium working group, and we will have some binary phenotypes. Is there a fix for this yet, or do you have suggestions for workarounds?
Similar question to the one I just answered here: Hail implementation of RUTH
Unfortunately, it’s hard to build this infrastructure and it’s not done yet. Hopefully soon!
Hi again! I have a workaround for this that I wanted to sanity check with you. I am thinking the best option for the short term would be to just transform the results from linreg on binary traits. The specific equation for this in the literature (e.g. https://www.nature.com/articles/ejhg2016150#Sec2) is
effect logistic = effect linear / ( (intercept linear) / (1- intercept linear)). I am currently including the intercept term explicitly in hl.agg.linreg, so I guess I could just take the beta from that?
specifically, the regression is run with
hl.agg.linreg(mt.TC, [1.0, mt.hapcounts0.x, mt.anc0dos.x, mt.anc1dos.x, covariates...) so I am thinking I could just get the intercept with mt.TC.beta ? I could then transform the betas for ancestry 0 and 1 separately.
The betas are indeed in the same order as the independent variable array, so beta should be the beta for 1.0 (the intercept).
Hello again! It’s my annual check on the status of logistic regression on entries. Are there any updates on implementation by any chance?
Thanks so much!