Annotation table for logreg


#1

Hi,

I was wondering what was the best way to import a table in which some of the phenotypes are reported as 0 (control) and 1 (case), in order to perform a logistic regression. At the moment I use hl.cond to create bool columns:

def case(pheno):
    return hl.cond(pheno == 1, True, False)

table = table.annotate(pheno_case = case(table.pheno))
ds = ds.annotate_cols(**table[ds.s])

and then:

gwas = hl.logistic_regression(test='score',y=ds.pheno_case,x=ds.GT.n_alt_alleles(),covariates=[ds.is_female, ds.age, ds.weight, ds.PC1, ds.PC2, ds.PC3, ds.PC4, ds.PC5])

I must be doing something wrong though, as when I use logreg all the p-values are inflated* (and that’s also the case even if I use 0/1 instead of False/True)

  • I know there are inflated because I have performed logistic regressions on that same set using other software. Also, interestingly, using linreg (with 0s and 1s, obviously) does give the expected result, no inflation.

#2

Hi there! You need to include the intercept explicitly as a covariate 1. See the examples in the documentation of logistic_regression.


#3

Argh… yes, it works now! Thank you.

To get back to the table question, is using hl.cond the best way to get bool, or is there an automatic way to specify 0/1 columns that should be treated as bool during table import?


#4

What does the hl.cond look like? There may be an easier way. For example, hl.cond(mt.foo == 1, True, False) is the same as just mt.foo == 1.

You can also use 0/1 numeric phenotypes for logistic regression – we check internally.


#5

Splendid, no need for the hl.cond then, thanks :sweat_smile: