Splitting a phase GT field into haplotypes?

Hello all! If I have an entry of phased genotypes (GT), is it possible to split it into two different entries for each haplotype, within a matrixtable? Thank you!

Yes. mt.GT[0] and mt.GT[1] refer to the two alleles of a call, and if that call is phased these will refer to the two phased haplotypes.

Oh ok that makes sense! Is there a way to split it into two different fields?

mt.annotate_entries(fieldOne = mt.GT[0], fieldTwo = mt.GT[1])

Where you can change fieldOne and fieldTwo to any names you like.

Thank you for your response @johnc1231. I have tried that command and got this response:


The GT is within the entries, may I know if there is a way to resolve this?

You don’t need to call .entries() to access the entry fields. .entries() is an operation: it converts from a dense, efficient MatrixTable format to a huge, inefficient Table format. It’s useful in a very limited set of circumstances.

Try this instead:

new_mt.annotate_entries(
    fieldOne = new_mt.mother_entry.PBT_GT[0],
    fieldTwo = new_mt.mother_entry.PBT_GT[1]
)

Oh okay, that makes sense! Sorry for this, a lot of Hail’s intricacies do not come naturally to me yet. Is there a way to transfer the haplotype information (+ the associated variant) into a CSV file so that I can work with it on Python?

Thank you sooo much for all your help.

1 Like