I’m new to hail, and I was hoping you could help me with the following:
I have installed hail to the cluster, but I cannot create/view plots there. Ideally, I would like to create plots for all the metrics that come up when I use the mt.summarize() command. How can I save this output, so that I can then read it in R?
Thanks for reaching out to us!
If you would like to visualize QC metrics like call rates etc, I would highly recommend using the Hail plotting tools
You can either use the save button on the interactive notebook or use this discussion to guide you to it .
It would be more cost effective then to summarize what you will need in a smaller dataset/table, export it as a tsv, and import it into your R environment for further plotting with the admittedly amazing
Thank you very much for your prompt response. This is very helpful. I don’t have the jupiter notebook installed, instead I use ipython. I followed the tutorial and used these instructions:
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot
And then it looks like I created the plots, but I can’t find a way to save them. I looked at the discussion link you provided, but it seems that I need to install dependencies (selenium etc). Hail has been installed as a singularity image at the cluster, so I have now asked our technician to see if he can add these to the image. In the meantime, is there another way to save the plots? Also, you mentioned that I could summarize what I will need in a smaller dataset/table and export it. Could you please provide me with some example commands of how I could do this?
Thank you very much!
Sorry you’re having trouble with Hail’s plotting libraries.
You’re right, there’s no easy way to save plots generated using Hail’s plotting libraries. My apologies. This is a significant known problem for us. I recommend not using Hail’s plotting libraries if you need to save the plots for later use.
Kumar is suggesting something like this:
import hail as hl
mt = hl.balding_nichols_model(3, 10, 10)
mt = hl.sample_qc(mt, name='sample_qc')
No problem at all. Thank you clarifying! I used the commands you suggested and the table is saved in the cluster now! woohoo!
Do you know how I could also get the variant_qc metrics in a table?
I have one more question. I read the file in R using this command:
And now I have a file with two columns, one has the sample ID and the other one has the sample_qc statistics per sample ID in the following format:
Do you know to ‘destring’ the sample_qc column, so that I can then create the plots?
Thank you very much for your help!
For variant QC, similarly to the sample QC, you would use
To split strings, I tend to use the function
R:strsplit where usage instructions are found here
.flatten() before the
.export(...) and you’ll get many columns.
Thank you @danking and @kumarveerapen!
One last question, re the variant_qc. I use these commands as in the sample_qc above:
mt = hl.variant_qc(mt, name=‘variant_qc’)
But I get an error at the last command: LookupError: Table instance has no field ‘variant_qc’
Do you maybe know why I get this error?
Thanks again, I really appreciate your help and prompt responses.
.rows() for variant_qc and
.cols() for sample_qc. The variants are the row axis in Hail, samples are the column axis.