Annotation table for logreg


I was wondering what was the best way to import a table in which some of the phenotypes are reported as 0 (control) and 1 (case), in order to perform a logistic regression. At the moment I use hl.cond to create bool columns:

def case(pheno):
    return hl.cond(pheno == 1, True, False)

table = table.annotate(pheno_case = case(table.pheno))
ds = ds.annotate_cols(**table[ds.s])

and then:

gwas = hl.logistic_regression(test='score',y=ds.pheno_case,x=ds.GT.n_alt_alleles(),covariates=[ds.is_female, ds.age, ds.weight, ds.PC1, ds.PC2, ds.PC3, ds.PC4, ds.PC5])

I must be doing something wrong though, as when I use logreg all the p-values are inflated* (and that’s also the case even if I use 0/1 instead of False/True)

  • I know there are inflated because I have performed logistic regressions on that same set using other software. Also, interestingly, using linreg (with 0s and 1s, obviously) does give the expected result, no inflation.

Hi there! You need to include the intercept explicitly as a covariate 1. See the examples in the documentation of logistic_regression.

Argh… yes, it works now! Thank you.

To get back to the table question, is using hl.cond the best way to get bool, or is there an automatic way to specify 0/1 columns that should be treated as bool during table import?

What does the hl.cond look like? There may be an easier way. For example, hl.cond( == 1, True, False) is the same as just == 1.

You can also use 0/1 numeric phenotypes for logistic regression – we check internally.

Splendid, no need for the hl.cond then, thanks :sweat_smile: