Create plots in R based on the hail metrics

Hi there!

I’m new to hail, and I was hoping you could help me with the following:

I have installed hail to the cluster, but I cannot create/view plots there. Ideally, I would like to create plots for all the metrics that come up when I use the mt.summarize() command. How can I save this output, so that I can then read it in R?

Thank you!
Maria

Hi Maria

Thanks for reaching out to us!

If you would like to visualize QC metrics like call rates etc, I would highly recommend using the Hail plotting tools
You can either use the save button on the interactive notebook or use this discussion to guide you to it .

It would be more cost effective then to summarize what you will need in a smaller dataset/table, export it as a tsv, and import it into your R environment for further plotting with the admittedly amazing ggplot2

1 Like

Hi Kumar,

Thank you very much for your prompt response. This is very helpful. I don’t have the jupiter notebook installed, instead I use ipython. I followed the tutorial and used these instructions:
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot
output_notebook()

And then it looks like I created the plots, but I can’t find a way to save them. I looked at the discussion link you provided, but it seems that I need to install dependencies (selenium etc). Hail has been installed as a singularity image at the cluster, so I have now asked our technician to see if he can add these to the image. In the meantime, is there another way to save the plots? Also, you mentioned that I could summarize what I will need in a smaller dataset/table and export it. Could you please provide me with some example commands of how I could do this?
Thank you very much!

Hey @Maria!

Sorry you’re having trouble with Hail’s plotting libraries.

You’re right, there’s no easy way to save plots generated using Hail’s plotting libraries. My apologies. This is a significant known problem for us. I recommend not using Hail’s plotting libraries if you need to save the plots for later use.

Kumar is suggesting something like this:

import hail as hl

mt = hl.balding_nichols_model(3, 10, 10)
mt = hl.sample_qc(mt, name='sample_qc')
mt.cols().select('sample_qc').export('output/table1.tsv.bgz')

Hi @danking!

No problem at all. Thank you clarifying! I used the commands you suggested and the table is saved in the cluster now! woohoo!
Do you know how I could also get the variant_qc metrics in a table?

I have one more question. I read the file in R using this command:
t1<-read.table(gzfile("/Volumes/output/table1.tsv.bgz"))
And now I have a file with two columns, one has the sample ID and the other one has the sample_qc statistics per sample ID in the following format:
{“dp_stats”:{“mean”:35.87442442237584,“stdev”:22.4342,“min”:0.0, etc

Do you know to ‘destring’ the sample_qc column, so that I can then create the plots?

Thank you very much for your help!
Maria

For variant QC, similarly to the sample QC, you would use variant_qc here

To split strings, I tend to use the function R:strsplit where usage instructions are found here

Add a .flatten() before the .export(...) and you’ll get many columns.

Thank you @danking and @kumarveerapen!

One last question, re the variant_qc. I use these commands as in the sample_qc above:
mt=hl.read_matrix_table(’/hail/chr22.mt’)
mt = hl.variant_qc(mt, name=‘variant_qc’)
mt.cols().select(‘variant_qc’).export(‘output/tablevc.tsv.bgz’)

But I get an error at the last command: LookupError: Table instance has no field ‘variant_qc’

Do you maybe know why I get this error?

Thanks again, I really appreciate your help and prompt responses.
Maria

you’ll want .rows() for variant_qc and .cols() for sample_qc. The variants are the row axis in Hail, samples are the column axis.

Thank you!!!