Genotypic Phase


I need to work with phased VCF genotype field, so I am wondering how it is managed in Hail 0.2. Could you give me some directions on that? For instance, is there a way to see which variants have phased mt.GT and which samples? When I print out mt.GT I don’t really see any phasing defined as ‘|’ lines. Basically I am wondering whats the mechanism for handling them in Hail 0.2, and interested in seeing several examples of how it could be approached.

What do you want to do with the phased genotypes? It’s possible we need to fix our print to accommodate phase, but that’s probably a distinct issue.

I want to store VCF phased data in the appropriate data structure (would be better it to be already inside MatrixTable as GT), and just write it to the index. Right now as I understand phased data will be lost if I just stick to MatrixTable without using anything else, right? It would also be nice to have some tutorial on how to work with it in Hail and what we could do with it. Currently its just the easy task: read in phased data → write it to ES, but we may need to do some calculations based on phased data in Hail too, just not now.

Hail preserves phased genotypes from VCFs (not from BGEN currently), so this should be fine.

1 Like