Getting low call rate when converting from VDS to Sparse MatrixTable

Hi!

I’m trying to perform QC on a Hail VDS of ~40K samples, but I’m getting a very low average call rate if I convert the VDS into a sparse MatrixTable instead of a dense MatrixTable.

From what I understand, it appears that hail.vds.sample_qc does not return the call rate due to the lack of a GT field in the SVCR format. As such, I’ve tried to convert the VDS to a MatrixTable and add the GT field using hl.vds.lgt_to_gt. Below is a snippet of what I’ve tested.

vds = hl.vds.read_vds('/path/to/vds')
mt = hl.vds.to_merged_sparse_mt(vds)
mt = mt.annotate_entries(GT = hl.vds.lgt_to_gt(mt.LGT, mt.LA))
mt = hl.sample_qc(mt)
call_rate_stats = mt.aggregate_cols(hl.agg.stats(mt.sample_qc.call_rate))
print(call_rate_stats)

But the call rate stats returns:

Struct(mean=0.02902165238671222, stdev=0.012164182297370613, min=0.0073442539129184746, max=0.09326385727999727, n=42105, sum=1221.956673742518)

Conversely, if I replace line 2 with hl.vds.to_dense_mt(vds), then the call rate stats look more reasonable:

Struct(mean=0.9465120866668351, stdev=0.009011747423607979, min=0.9220336485184929, max=0.9535544189899148, n=42105, sum=39852.89140910709)

Is there a way to get around this issue? Ideally, I’d like to use the sparse MatrixTable for QC so that I can convert it back to a VDS for further analysis.