Hi Abhishek,
If you don’t pass a range, histogram
will do the following computation:
start, end = mt.aggregate_entries((hl.agg.min(mt.DP), hl.agg.max(mt.DP)))
dp_hist = mt.aggregate_entries(hl.agg.hist(mt.DP, start, end, bins))
If you do pass a range, it will only compute the second step. So if you are doing the same thing and passing the result to histogram
, it will make no performance difference. However, one benefit of doing the aggregation yourself is that you can save dp_hist
, and regenerate the plot without rerunning the aggregation.
If you’re doing exploratory analysis and might want to try plotting with different numbers of bins, histogram
can also take the results of the approx_cdf
aggregator, which is a more sophisticated stigmatization of a distribution of values (see [Feature] Approximate quantiles, cdf and pdf plots for more details). With the interactive=True
flag, you can interactively modify the number of bins in the histogram. The tradeoff is that it won’t be as accurate as a hist
aggregator with predetermined number of bins.