Calculation of mean depth

inninun · August 4, 2021, 6:23am

Hi hail team,

I checked the sample_qc.dp_stats.mean to calculate the mean depth of each sample.

Most samples were expected to be 30-40X, but only 20-30X came out.

When compared by sample, there was a difference between sample qc (Hail) and Depth of coverage (GATK).

I wonder why this difference appears. Could you tell me how to get mean depth from hail sample_qc?

Thanks,

Minyoung

tpoterba · August 4, 2021, 11:42am

I’m assuming you imported a project VCF to a Hail MatrixTable before running hl.sample_qc. The mean DP produced by GATK and Hail in this case cannot be the same, because project VCF is a lossy format which discards information about loci between sites where an individual in your dataset has a polymorphism.

Hail’s dp_stats.mean is defined as, for each sample, the sum of DP values for entries observed in your VCF, divided by the number of non-missing values of DP. I would expect this to be slightly lower than the true read coverage, especially if there are low-complexity (telomeric, centromeric) regions in your VCF which bias the inclusion of a locus toward low-depth, badly-covered positions where lots of indel variants appear.

inninun · August 5, 2021, 5:38am

Thank you for the explanation! I understood it well!

Topic		Replies	Views
Sample_qc help on depth mean and genotype quality Help [0.1]	2	729	February 19, 2018
Missing read depth mean after filter_entries() Hail Query & hailctl	1	186	December 18, 2023
Hail sample_qc results Hail Query & hailctl	15	449	September 7, 2022
Calculate DP for all the indels from VCF file Hail Query & hailctl	1	407	January 29, 2019
Compute QC metrics for each variant and position Hail Query & hailctl	5	370	September 11, 2020

Calculation of mean depth

Related topics