I’m new to hail, and I was hoping you could help me with the following:
I have installed hail to the cluster, but I cannot create/view plots there. Ideally, I would like to create plots for all the metrics that come up when I use the mt.summarize() command. How can I save this output, so that I can then read it in R?
If you would like to visualize QC metrics like call rates etc, I would highly recommend using the Hail plotting tools
You can either use the save button on the interactive notebook or use this discussion to guide you to it .
It would be more cost effective then to summarize what you will need in a smaller dataset/table, export it as a tsv, and import it into your R environment for further plotting with the admittedly amazing ggplot2
Thank you very much for your prompt response. This is very helpful. I don’t have the jupiter notebook installed, instead I use ipython. I followed the tutorial and used these instructions:
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot
output_notebook()
And then it looks like I created the plots, but I can’t find a way to save them. I looked at the discussion link you provided, but it seems that I need to install dependencies (selenium etc). Hail has been installed as a singularity image at the cluster, so I have now asked our technician to see if he can add these to the image. In the meantime, is there another way to save the plots? Also, you mentioned that I could summarize what I will need in a smaller dataset/table and export it. Could you please provide me with some example commands of how I could do this?
Thank you very much!
Sorry you’re having trouble with Hail’s plotting libraries.
You’re right, there’s no easy way to save plots generated using Hail’s plotting libraries. My apologies. This is a significant known problem for us. I recommend not using Hail’s plotting libraries if you need to save the plots for later use.
No problem at all. Thank you clarifying! I used the commands you suggested and the table is saved in the cluster now! woohoo!
Do you know how I could also get the variant_qc metrics in a table?
I have one more question. I read the file in R using this command:
t1<-read.table(gzfile("/Volumes/output/table1.tsv.bgz"))
And now I have a file with two columns, one has the sample ID and the other one has the sample_qc statistics per sample ID in the following format:
{“dp_stats”:{“mean”:35.87442442237584,“stdev”:22.4342,“min”:0.0, etc
Do you know to ‘destring’ the sample_qc column, so that I can then create the plots?
One last question, re the variant_qc. I use these commands as in the sample_qc above:
mt=hl.read_matrix_table(’/hail/chr22.mt’)
mt = hl.variant_qc(mt, name=‘variant_qc’)
mt.cols().select(‘variant_qc’).export(‘output/tablevc.tsv.bgz’)
But I get an error at the last command: LookupError: Table instance has no field ‘variant_qc’
Do you maybe know why I get this error?
Thanks again, I really appreciate your help and prompt responses.
Maria