Hello!
I am running a simple burden test using hl.logistic_regression_rows()
on a matrix table mt_burden
containing my genomic and covariate data. mt_burden
contains the following:
row fields:
gene_symbol
: str
column fields:
pheno.is_case
: int32 (coded 0/1),
pheno.age_at_enrollment
: int32,
pheno.pca_features
: array (list of 10 PCs)
entry_fields:
ind_het
: bool (carrier status)
The following call runs as expected for a simple burden test without an interaction term, returning the effect size estimates for the ind_het
covariate for each gene of interest in gene_symbol
:
covariates = [1.0, mt_burden.pheno.age_at_enrollment,
mt_burden.pca_features[1], ..., mt_burden.pca_features[10]]
log_reg = hl.logistic_regression_rows(
test = 'wald',
y = mt_burden.pheno.is_case,
x = mt_burden.ind_het,
covariates = covariates,
max_iterations = 50
)
I would like to run a similar analysis, but including a age\_at\_enrollment * ind\_het interaction term as a covariate. I can form this with the following…
mt_burden = mt_burden.annotate_entries(het_times_age = mt_burden.ind_het * mt_burden.pheno.age_at_enrollment)
…but then am running into a few problems:
- Since
het_times_age
is an entry field,hl.logistic_regression_rows
is throwing an error when this is included in the list of covariates. I see this thread, but was wondering if there was any update on this functionality! - More generally, I would sometimes like the logistic regression to return beta estimates for more than one covariate. I see that
hl.logistic_regression_rows
only accepts a single<float64>
value forx
, and it throws an error when I try and pass a list of covariates. Is there a way of accomplishing this?
Sorry for the trouble and thanks so much for your help!
Best,
John