Logistic regression on entries

eatkinson · April 18, 2020, 10:35pm

Hi Hail!

I have been happily running linear regressions on entry data as such:
mt = mt.annotate_rows(GWAS= hl.agg.linreg( mt.pheno, [1.0, mt.entry1, mt.entry2, covariates]))

I now have several binary traits and am wondering if there’s a comparable way to run logistic regression on entries rather than rows?

Thanks so much!
Elizabeth

eatkinson · April 24, 2020, 5:32pm

Just checking in again - any advice for handling this? Thanks!

tpoterba · April 24, 2020, 5:35pm

Ack, sorry, I thought I’d responded. Unfortunately logistic regression doesn’t have a one-pass algorithm, so it doesn’t fit as naturally into the aggregator system. We’ll have a way to do this in the next 6-9 months, I think, but before then the rather inflexible interface of logistic_regression_rows is the only option.

eatkinson · April 24, 2020, 6:08pm

ok good to know. Thanks, and keep please me in the loop when this is added! Would be great to package this feature into the Tractor framework for case/control phenotypes.

tpoterba · April 24, 2020, 6:10pm

will do!

eatkinson · July 2, 2020, 7:32pm

Hey again! I am hoping to run this pipeline for a consortium working group, and we will have some binary phenotypes. Is there a fix for this yet, or do you have suggestions for workarounds?

tpoterba · July 2, 2020, 7:34pm

Similar question to the one I just answered here: Hail implementation of RUTH

Unfortunately, it’s hard to build this infrastructure and it’s not done yet. Hopefully soon!

eatkinson · November 20, 2020, 12:09am

Hi again! I have a workaround for this that I wanted to sanity check with you. I am thinking the best option for the short term would be to just transform the results from linreg on binary traits. The specific equation for this in the literature (e.g. https://www.nature.com/articles/ejhg2016150#Sec2) is effect logistic = effect linear / ( (intercept linear) / (1- intercept linear)). I am currently including the intercept term explicitly in hl.agg.linreg, so I guess I could just take the beta from that?

eatkinson · November 20, 2020, 12:13am

specifically, the regression is run with hl.agg.linreg(mt.TC, [1.0, mt.hapcounts0.x, mt.anc0dos.x, mt.anc1dos.x, covariates...) so I am thinking I could just get the intercept with mt.TC.beta[0] ? I could then transform the betas for ancestry 0 and 1 separately.

danking · November 20, 2020, 6:03pm

The betas are indeed in the same order as the independent variable array, so beta[0] should be the beta for 1.0 (the intercept).

eatkinson · December 6, 2021, 10:38pm

Hello again! It’s my annual check on the status of logistic regression on entries. Are there any updates on implementation by any chance?

Thanks so much!
Elizabeth

Topic		Replies	Views
Linear regression per column with entry fields Hail Query & hailctl	2	340	August 28, 2021
Multiple trait GWAS? Hail Query & hailctl	1	440	June 23, 2021
Logistic regression on remote servers Hail Query & hailctl	1	432	October 14, 2020
Ordinal logistic regression Hail Query & hailctl	4	661	April 27, 2020
Rewrite R glm in hail Hail Query & hailctl	7	454	May 5, 2022

Logistic regression on entries

Related topics