Missing read depth mean after filter_entries()

Hi all,

Just wondered if you had a way to calculate the distribution of the mean read depth across all loci. I get ‘nan’ for dp_stats.mean at mean and std dev as follows after applying filter_entries.

mt_filtered = mt.filter_entries(
((hl.is_snp(mt.alleles[0], mt.alleles[1]) & (mt.DP >= snv_min_coverage)) |
(hl.is_indel(mt.alleles[0], mt.alleles[1]) & (mt.DP >= indel_min_coverage)))
)

mt_filtered.variant_qc.dp_stats.summarize()

dp_stats.mean (float64 ):

Non-missing 63103 (100.00%)
Missing 0
Minimum 7.58
Maximum 91.90
Mean nan
Std Dev nan

Thanks in advance!

Hi @barioux !

This means that dp_stats.mean is NaN for at least one variant. You can use hl.is_nan with filter_rows to filter to a variant with a NaN for its dp_stats.mean.

dp_stats.mean is defined as hl.agg.stats(mt.DP). hl.agg.stats(...).mean is usually NaN when every entry is filtered or has a missing DP field. compute_entry_filter_stats can provide insight into which rows contain only filtered entries.

In particular, you probably have a SNP or INDEL where none of the entries have the minimum coverage.