Getting low call rate when converting from VDS to Sparse MatrixTable

sk4 · April 8, 2023, 5:01pm

Hi!

I’m trying to perform QC on a Hail VDS of ~40K samples, but I’m getting a very low average call rate if I convert the VDS into a sparse MatrixTable instead of a dense MatrixTable.

From what I understand, it appears that hail.vds.sample_qc does not return the call rate due to the lack of a GT field in the SVCR format. As such, I’ve tried to convert the VDS to a MatrixTable and add the GT field using hl.vds.lgt_to_gt. Below is a snippet of what I’ve tested.

vds = hl.vds.read_vds('/path/to/vds')
mt = hl.vds.to_merged_sparse_mt(vds)
mt = mt.annotate_entries(GT = hl.vds.lgt_to_gt(mt.LGT, mt.LA))
mt = hl.sample_qc(mt)
call_rate_stats = mt.aggregate_cols(hl.agg.stats(mt.sample_qc.call_rate))
print(call_rate_stats)

But the call rate stats returns:

Struct(mean=0.02902165238671222, stdev=0.012164182297370613, min=0.0073442539129184746, max=0.09326385727999727, n=42105, sum=1221.956673742518)

Conversely, if I replace line 2 with hl.vds.to_dense_mt(vds), then the call rate stats look more reasonable:

Struct(mean=0.9465120866668351, stdev=0.009011747423607979, min=0.9220336485184929, max=0.9535544189899148, n=42105, sum=39852.89140910709)

Is there a way to get around this issue? Ideally, I’d like to use the sparse MatrixTable for QC so that I can convert it back to a VDS for further analysis.

Topic		Replies	Views
Getting low call rate when converting from VDS to Sparse MatrixTable Hail Query & hailctl	1	247	April 12, 2023
Densifying VDS to MatrixTable very expensive Hail Query & hailctl	2	363	November 13, 2023
Export VDS to VCF Hail Query & hailctl	12	1258	January 12, 2023
Call Rate after GT revising Hail Query & hailctl	0	303	October 7, 2021
Poor performance for QC filtering on medium sized genotype data Hail Query & hailctl	20	2168	February 8, 2020

Getting low call rate when converting from VDS to Sparse MatrixTable

Related topics