Hello,
I just want to make sure that I understand one thing correctly. I load few qVCF files to a VDS using combiner, then I split the multiallelic and then combine the variant data and reference data. I use something similar to this:
combiner = hl.vds.new_combiner(
output_path=combined_vds_path,
temp_path=tmp_dir,
gvcf_paths=gvcf_paths,
use_genome_default_intervals=True,
reference_genome=ref_genome
)
combiner.run()
vds = hl.vds.read_vds(COMBINED_VDS_PATH)
split_vds = hl.vds.split_multi(vds=vds)
dense_mt = hl.vds.to_dense_mt(split_vds)
and I try to understand how missing/unknown values are handled in both gVCF and then in Hail. Is it true that:
- If the locus is entirely missing in the gVCF, all entries for that column (sample) are set to NA
- If the locus is present in the gVCF in the block, and the GT for the block is unknown, e.g. ./., then also all entries for that column (sample) are set to NA, even though the values like e.g. GQ or MIN_DP are known in the gVCF file?
I just wanted to make sure I can rely on that behavior, because I could not find any description in the documentation. Thanks in advance!
And if there is a chance you got to this post, I just wondered one more silly question : How can one use mt.show() with selected rows fields and entry fields? Selecting entry fields works like a charm, e.g.
mt.select_entries(mt.GT, mt.GQ, mt.AD).show(100, n_cols=2)
but what if i want to see those entries, but also some other row field other than alleles and locus, e.g. AF, along with those selected entries?
Thank you for your amazing work!