Regression with multiple phenotypes with varying degrees of missingness

danking · March 20, 2020, 6:00pm

First, let me clarify that if two phenotypes appear in the same inner list, e.g. A and B in [[A, B], [C]], they are treated as independent regressions (the same as if we had written [[A],[B],[C]]), but are computed in a more computationally efficient way. The only requirement for appearing in the same inner list is a shared missingness pattern. Essentially, this expression should evaluate to true:

hl.is_defined(mt.pheno.A).collect() == hl.is_defined(mt.pheno.B).collect()

If every phenotype has a different missingness pattern, you can easily produce a list of independent lists with this python expression:

[[mt.pheno[x]] for x in mt.pheno]

This works because any “struct” (aka nested) annotation in Hail can be used as if it is a list of its children:

In [11]: mt = hl.balding_nichols_model(3,10,10) 
    ...: mt.bn.describe()                                                                                                                                                                                
2020-03-20 13:59:59 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 10 variants...
--------------------------------------------------------
Type:
        struct {
        n_populations: int32, 
        n_samples: int32, 
        n_variants: int32, 
        n_partitions: int32, 
        pop_dist: array<int32>, 
        fst: array<float64>, 
        mixture: bool
    }
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x120b2c4d0>
Index:
    []
--------------------------------------------------------

In [12]: list(mt.bn)                                                                                                                                                                                     
Out[12]: 
['n_populations',
 'n_samples',
 'n_variants',
 'n_partitions',
 'pop_dist',
 'fst',
 'mixture']

Topic		Replies	Views
[Feature] Chained linear regression Updates	0	982	October 26, 2018
Parsing results from regression on multiple phenotypes Hail Query & hailctl	0	12	April 25, 2025
Linear regression define subsets of phenotypes Hail Query & hailctl	8	758	December 18, 2019
Multiple trait GWAS? Hail Query & hailctl	1	441	June 23, 2021
PheWAS on DNAnexus UKB RAP Hail Query & hailctl	13	853	December 21, 2022

Regression with multiple phenotypes with varying degrees of missingness

Related topics