[Feature] Chained linear regression

tpoterba · October 26, 2018, 1:43am

We’ve added a new capability to linear_regression_rows: the ability to run multiple regressions with different missingness patterns with one call to the function.

The old (and preserved!) behavior is that passing a list of phenotypes to linear_regression_rows will fit the phenotypes in parallel, but with the caveat that the samples used are the ones for which all phenotypes and covariates are non-missing, i.e. the “intersection” of samples.

For example, if pheno1 and pheno2 are such that no sample has both pheno1 and pheno2 defined, then the following code will drop all samples and fail.

result = hl.linear_regression_rows(
    y=[mt.pheno1, mt.pheno2],
    x=mt.GT.n_alt_alleles(),
    covariates=[1, mt.cov1])

The new behavior is that it is now possible to pass a list of lists (i.e., groups) as the y parameter. Each group of phenotypes is run on the intersection of samples as above, but distinct groups are considered independently with respect to sample missing-ness. For example, the following code will instead regress each phenotype on the subset of samples for which that phenotype (alone) and all covariates are defined:

result = hl.linear_regression_rows(
    y=[[mt.pheno1], [mt.pheno2]],
    x=mt.GT.n_alt_alleles(),
    covariates=[1, mt.cov1])

Here’s a more interesting example that computes, for a group of phenotypes, the result of linear regression for each phenotype on the intersection of samples overall, as well as results stratified by sex:

phenos = [mt.pheno1, mt.pheno2, mt.pheno3, ...]
male_only = [hl.case().when(~mt.is_female, pheno).or_missing() for pheno in phenos]
female_only = [hl.case().when(mt.is_female, pheno).or_missing() for pheno in phenos]

result = hl.linear_regression_rows(
    y=[phenos, male_only, female_only],
    x=mt.GT.n_alt_alleles(),
    covariates=[1, mt.cov1])

Topic		Replies	Views
Regression with multiple phenotypes with varying degrees of missingness Hail Query & hailctl	5	675	April 27, 2020
Parsing results from regression on multiple phenotypes Hail Query & hailctl	0	12	April 25, 2025
Linear regression define subsets of phenotypes Hail Query & hailctl	8	760	December 18, 2019
Multiple trait GWAS? Hail Query & hailctl	1	447	June 23, 2021
Chained logistic regression Hail Query & hailctl	1	661	October 30, 2018

[Feature] Chained linear regression

Related topics