Regression with multiple phenotypes with varying degrees of missingness

First, let me clarify that if two phenotypes appear in the same inner list, e.g. A and B in [[A, B], [C]], they are treated as independent regressions (the same as if we had written [[A],[B],[C]]), but are computed in a more computationally efficient way. The only requirement for appearing in the same inner list is a shared missingness pattern. Essentially, this expression should evaluate to true:

hl.is_defined(mt.pheno.A).collect() == hl.is_defined(mt.pheno.B).collect()

If every phenotype has a different missingness pattern, you can easily produce a list of independent lists with this python expression:

[[mt.pheno[x]] for x in mt.pheno]

This works because any “struct” (aka nested) annotation in Hail can be used as if it is a list of its children:

In [11]: mt = hl.balding_nichols_model(3,10,10) 
    ...: mt.bn.describe()                                                                                                                                                                                
2020-03-20 13:59:59 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 10 variants...
--------------------------------------------------------
Type:
        struct {
        n_populations: int32, 
        n_samples: int32, 
        n_variants: int32, 
        n_partitions: int32, 
        pop_dist: array<int32>, 
        fst: array<float64>, 
        mixture: bool
    }
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x120b2c4d0>
Index:
    []
--------------------------------------------------------

In [12]: list(mt.bn)                                                                                                                                                                                     
Out[12]: 
['n_populations',
 'n_samples',
 'n_variants',
 'n_partitions',
 'pop_dist',
 'fst',
 'mixture']