Hi!
I’m trying to perform QC on a Hail VDS of ~40K samples, but I’m getting a very low average call rate if I convert the VDS into a sparse MatrixTable instead of a dense MatrixTable.
From what I understand, it appears that hail.vds.sample_qc
does not return the call rate due to the lack of a GT
field in the SVCR format. As such, I’ve tried to convert the VDS to a MatrixTable and add the GT
field using hl.vds.lgt_to_gt
. Below is a snippet of what I’ve tested.
vds = hl.vds.read_vds('/path/to/vds')
mt = hl.vds.to_merged_sparse_mt(vds)
mt = mt.annotate_entries(GT = hl.vds.lgt_to_gt(mt.LGT, mt.LA))
mt = hl.sample_qc(mt)
call_rate_stats = mt.aggregate_cols(hl.agg.stats(mt.sample_qc.call_rate))
print(call_rate_stats)
But the call rate stats returns:
Struct(mean=0.02902165238671222, stdev=0.012164182297370613, min=0.0073442539129184746, max=0.09326385727999727, n=42105, sum=1221.956673742518)
Conversely, if I replace line 2 with hl.vds.to_dense_mt(vds)
, then the call rate stats look more reasonable:
Struct(mean=0.9465120866668351, stdev=0.009011747423607979, min=0.9220336485184929, max=0.9535544189899148, n=42105, sum=39852.89140910709)
Is there a way to get around this issue? Ideally, I’d like to use the sparse MatrixTable for QC so that I can convert it back to a VDS for further analysis.