Hello! I am new to Hail and I would like some advice. I can access the DP from my MatrixTable with this: mt.rows().info.DP.show()
And I will get this:
How can I extract the DP values into a single python list, or any other format that I can use? Thank you so much.
You can do:
all_dp_values = mt.aggregate_rows(hl.agg.collect(mt.info.DP))
Note this is going to take a while and put a LOT of data in memory in Python. Collecting data into Python lists is not generally recommended – what in particular are you trying to do with the DP values? There’s probably a way to do that in Hail.
Thank you for your response! I am trying to plot a histogram of the values. I am not too familiar with all the Hail functions so I always tend to extract them to work in Python or R
ah, I see. This will work fine with small data, but what if you want to plot a histogram of all GQ values (an entry/genotype field)? Collecting these as a list in Python is not feasible beyond tiny datasets.
Hail has some plotting functions in hl.plot
, like hl.plot.histogram
, which make plots for you in ways that can scale to large datasets. These plots are generated using the bokeh
library.