Ways to speed up QC plots computation

Hi Abhishek,

If you don’t pass a range, histogram will do the following computation:

start, end = mt.aggregate_entries((hl.agg.min(mt.DP), hl.agg.max(mt.DP)))
dp_hist = mt.aggregate_entries(hl.agg.hist(mt.DP, start, end, bins))

If you do pass a range, it will only compute the second step. So if you are doing the same thing and passing the result to histogram, it will make no performance difference. However, one benefit of doing the aggregation yourself is that you can save dp_hist, and regenerate the plot without rerunning the aggregation.

If you’re doing exploratory analysis and might want to try plotting with different numbers of bins, histogram can also take the results of the approx_cdf aggregator, which is a more sophisticated stigmatization of a distribution of values (see [Feature] Approximate quantiles, cdf and pdf plots for more details). With the interactive=True flag, you can interactively modify the number of bins in the histogram. The tradeoff is that it won’t be as accurate as a hist aggregator with predetermined number of bins.

1 Like