Experimental.run_combiner produces null genotypes

I don’t think this is a bug, but after merging two gVCFs with run_combiner, I obtain an output with null genotypes for some samples and loci:

chr1:10001	0/0,10004	0/0,10019
chr1:10005	0/0,10005	NA,NA
chr1:10006	0/0,10024	NA,NA

I think I understand why this is happening - run_combiner is splitting the locii where the samples have different genotypes. But I wonder if there is a way to produce output such that every sample has a genotype at every locus (the following output was obtained with GATK CombineGVCFs):


Any help anyone could offer would be appreciated. Hail is very convenient in other ways, so I am hoping I can find a way around this problem.

The experimental.run_combiner functionality is deprecated in favor of new VariantDataset functionality (which has the same data model) found here: Hail | Variant Dataset

We’re working on fleshing out the docs for this.

The basic answer is that the matrix tables / VDSes returned by the combiner are sparse, but can be converted into a dense VCF-like representation. I think this can be done with hl.experimental.densify() on the old representation, and hl.vds.to_dense_mt(vds) on the new VDS functionality.