I had a question about the “alleles” rows in the VCF file. If all rows are displayed (mt.row.show(5)), it contains [‘G’, ‘T’], however when you zoom in to the alleles row (mt.row. alleles .show(5)), it seems like the content is different? How is the content of the one translated into the other?
I thought that (mt.row.show(5)) shows the reference and the variants, and (mt.row. alleles .show(5)) shows the reads of all samples per locus. But why is the ref then A (mt.row. alleles .show(5)) instead of G (mt.row.show(5)). And why does the above show only a single variant for the locus (mt.row.show(5)), whereas the bottom one shows multiple for the same locus(mt.row. alleles .show(5))?
This definitely looks like a bug. The info message plus the output makes me think that we’re sorting the result needlessly.
I think this probably happens when you
show() a key field that is not the first key.
Tracking issue is here: https://github.com/hail-is/hail/issues/5449
Should be fixed by the end of day.