Missing read depth mean after filter_entries()

barioux · December 18, 2023, 2:38pm

Hi all,

Just wondered if you had a way to calculate the distribution of the mean read depth across all loci. I get ‘nan’ for dp_stats.mean at mean and std dev as follows after applying filter_entries.

mt_filtered = mt.filter_entries(
((hl.is_snp(mt.alleles[0], mt.alleles[1]) & (mt.DP >= snv_min_coverage)) |
(hl.is_indel(mt.alleles[0], mt.alleles[1]) & (mt.DP >= indel_min_coverage)))
)

mt_filtered.variant_qc.dp_stats.summarize()

dp_stats.mean (float64 ):

Non-missing	63103 (100.00%)
Missing	0
Minimum	7.58
Maximum	91.90
Mean	nan
Std Dev	nan

Thanks in advance!

danking · December 18, 2023, 7:29pm

Hi @barioux !

This means that dp_stats.mean is NaN for at least one variant. You can use hl.is_nan with filter_rows to filter to a variant with a NaN for its dp_stats.mean.

dp_stats.mean is defined as hl.agg.stats(mt.DP). hl.agg.stats(...).mean is usually NaN when every entry is filtered or has a missing DP field. compute_entry_filter_stats can provide insight into which rows contain only filtered entries.

In particular, you probably have a SNP or INDEL where none of the entries have the minimum coverage.

Topic		Replies	Views
Calculation of mean depth Hail Query & hailctl	2	385	August 5, 2021
Entry filtering semantics question: allele read balance Hail Query & hailctl	12	923	October 31, 2019
Transmission Disequilibrium Test help Hail Query & hailctl	1	332	July 10, 2023
Filtering out samples using hl.is_nan Hail Query & hailctl	2	608	March 3, 2019
Call Rate NAs after GT revising Hail Query & hailctl	2	291	October 28, 2021

Missing read depth mean after filter_entries()

Related topics