We’ve added a new capability to
linear_regression_rows: the ability to run multiple regressions with different missingness patterns with one call to the function.
The old (and preserved!) behavior is that passing a list of phenotypes to
linear_regression_rows will fit the phenotypes in parallel, but with the caveat that the samples used are the ones for which all phenotypes and covariates are non-missing, i.e. the “intersection” of samples.
For example, if
pheno2 are such that no sample has both
pheno2 defined, then the following code will drop all samples and fail.
result = hl.linear_regression_rows( y=[mt.pheno1, mt.pheno2], x=mt.GT.n_alt_alleles(), covariates=[1, mt.cov1])
The new behavior is that it is now possible to pass a list of lists (i.e., groups) as the
y parameter. Each group of phenotypes is run on the intersection of samples as above, but distinct groups are considered independently with respect to sample missing-ness. For example, the following code will instead regress each phenotype on the subset of samples for which that phenotype (alone) and all covariates are defined:
result = hl.linear_regression_rows( y=[[mt.pheno1], [mt.pheno2]], x=mt.GT.n_alt_alleles(), covariates=[1, mt.cov1])
Here’s a more interesting example that computes, for a group of phenotypes, the result of linear regression for each phenotype on the intersection of samples overall, as well as results stratified by sex:
phenos = [mt.pheno1, mt.pheno2, mt.pheno3, ...] male_only = [hl.case().when(~mt.is_female, pheno).or_missing() for pheno in phenos] female_only = [hl.case().when(mt.is_female, pheno).or_missing() for pheno in phenos] result = hl.linear_regression_rows( y=[phenos, male_only, female_only], x=mt.GT.n_alt_alleles(), covariates=[1, mt.cov1])