First, let me clarify that if two phenotypes appear in the same inner list, e.g. A and B in [[A, B], [C]]
, they are treated as independent regressions (the same as if we had written [[A],[B],[C]]
), but are computed in a more computationally efficient way. The only requirement for appearing in the same inner list is a shared missingness pattern. Essentially, this expression should evaluate to true:
hl.is_defined(mt.pheno.A).collect() == hl.is_defined(mt.pheno.B).collect()
If every phenotype has a different missingness pattern, you can easily produce a list of independent lists with this python expression:
[[mt.pheno[x]] for x in mt.pheno]
This works because any “struct” (aka nested) annotation in Hail can be used as if it is a list of its children:
In [11]: mt = hl.balding_nichols_model(3,10,10)
...: mt.bn.describe()
2020-03-20 13:59:59 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 10 variants...
--------------------------------------------------------
Type:
struct {
n_populations: int32,
n_samples: int32,
n_variants: int32,
n_partitions: int32,
pop_dist: array<int32>,
fst: array<float64>,
mixture: bool
}
--------------------------------------------------------
Source:
<hail.matrixtable.MatrixTable object at 0x120b2c4d0>
Index:
[]
--------------------------------------------------------
In [12]: list(mt.bn)
Out[12]:
['n_populations',
'n_samples',
'n_variants',
'n_partitions',
'pop_dist',
'fst',
'mixture']