Annotation table for logreg

hhx037 · October 16, 2018, 1:34pm

Hi,

I was wondering what was the best way to import a table in which some of the phenotypes are reported as 0 (control) and 1 (case), in order to perform a logistic regression. At the moment I use hl.cond to create bool columns:

def case(pheno):
    return hl.cond(pheno == 1, True, False)

table = table.annotate(pheno_case = case(table.pheno))
ds = ds.annotate_cols(**table[ds.s])

and then:

gwas = hl.logistic_regression(test='score',y=ds.pheno_case,x=ds.GT.n_alt_alleles(),covariates=[ds.is_female, ds.age, ds.weight, ds.PC1, ds.PC2, ds.PC3, ds.PC4, ds.PC5])

I must be doing something wrong though, as when I use logreg all the p-values are inflated* (and that’s also the case even if I use 0/1 instead of False/True)

I know there are inflated because I have performed logistic regressions on that same set using other software. Also, interestingly, using linreg (with 0s and 1s, obviously) does give the expected result, no inflation.

jbloom · October 16, 2018, 1:51pm

Hi there! You need to include the intercept explicitly as a covariate 1. See the examples in the documentation of logistic_regression.

hhx037 · October 16, 2018, 2:10pm

Argh… yes, it works now! Thank you.

To get back to the table question, is using hl.cond the best way to get bool, or is there an automatic way to specify 0/1 columns that should be treated as bool during table import?

tpoterba · October 16, 2018, 2:21pm

What does the hl.cond look like? There may be an easier way. For example, hl.cond(mt.foo == 1, True, False) is the same as just mt.foo == 1.

You can also use 0/1 numeric phenotypes for logistic regression – we check internally.

hhx037 · October 16, 2018, 2:38pm

Splendid, no need for the hl.cond then, thanks

Topic		Replies	Views
Logistic regression implementation Hail Query & hailctl	4	717	September 23, 2020
Logistic regression on entries Hail Query & hailctl	10	1149	December 6, 2021
Programatically define covariates Hail Query & hailctl	2	456	May 3, 2019
Modifying variables within hl.agg.linreg Science	1	362	October 13, 2021
Logistic regression on remote servers Hail Query & hailctl	1	385	October 14, 2020

Annotation table for logreg

Related Topics