I was wondering what was the best way to import a table in which some of the phenotypes are reported as 0 (control) and 1 (case), in order to perform a logistic regression. At the moment I use hl.cond to create bool columns:
def case(pheno): return hl.cond(pheno == 1, True, False) table = table.annotate(pheno_case = case(table.pheno)) ds = ds.annotate_cols(**table[ds.s])
gwas = hl.logistic_regression(test='score',y=ds.pheno_case,x=ds.GT.n_alt_alleles(),covariates=[ds.is_female, ds.age, ds.weight, ds.PC1, ds.PC2, ds.PC3, ds.PC4, ds.PC5])
I must be doing something wrong though, as when I use logreg all the p-values are inflated* (and that’s also the case even if I use 0/1 instead of False/True)
- I know there are inflated because I have performed logistic regressions on that same set using other software. Also, interestingly, using linreg (with 0s and 1s, obviously) does give the expected result, no inflation.