Create plots in R based on the hail metrics

Maria · September 1, 2020, 4:25pm

Hi there!

I’m new to hail, and I was hoping you could help me with the following:

I have installed hail to the cluster, but I cannot create/view plots there. Ideally, I would like to create plots for all the metrics that come up when I use the mt.summarize() command. How can I save this output, so that I can then read it in R?

Thank you!
Maria

kumarveerapen · September 1, 2020, 5:21pm

Hi Maria

Thanks for reaching out to us!

If you would like to visualize QC metrics like call rates etc, I would highly recommend using the Hail plotting tools
You can either use the save button on the interactive notebook or use this discussion to guide you to it .

It would be more cost effective then to summarize what you will need in a smaller dataset/table, export it as a tsv, and import it into your R environment for further plotting with the admittedly amazing ggplot2

Maria · September 1, 2020, 9:17pm

Hi Kumar,

Thank you very much for your prompt response. This is very helpful. I don’t have the jupiter notebook installed, instead I use ipython. I followed the tutorial and used these instructions:
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot
output_notebook()

And then it looks like I created the plots, but I can’t find a way to save them. I looked at the discussion link you provided, but it seems that I need to install dependencies (selenium etc). Hail has been installed as a singularity image at the cluster, so I have now asked our technician to see if he can add these to the image. In the meantime, is there another way to save the plots? Also, you mentioned that I could summarize what I will need in a smaller dataset/table and export it. Could you please provide me with some example commands of how I could do this?
Thank you very much!

danking · September 2, 2020, 3:40pm

Hey @Maria!

Sorry you’re having trouble with Hail’s plotting libraries.

You’re right, there’s no easy way to save plots generated using Hail’s plotting libraries. My apologies. This is a significant known problem for us. I recommend not using Hail’s plotting libraries if you need to save the plots for later use.

Kumar is suggesting something like this:

import hail as hl

mt = hl.balding_nichols_model(3, 10, 10)
mt = hl.sample_qc(mt, name='sample_qc')
mt.cols().select('sample_qc').export('output/table1.tsv.bgz')

Maria · September 2, 2020, 4:41pm

Hi @danking!

No problem at all. Thank you clarifying! I used the commands you suggested and the table is saved in the cluster now! woohoo!
Do you know how I could also get the variant_qc metrics in a table?

I have one more question. I read the file in R using this command:
t1<-read.table(gzfile("/Volumes/output/table1.tsv.bgz"))
And now I have a file with two columns, one has the sample ID and the other one has the sample_qc statistics per sample ID in the following format:
{“dp_stats”:{“mean”:35.87442442237584,“stdev”:22.4342,“min”:0.0, etc

Do you know to ‘destring’ the sample_qc column, so that I can then create the plots?

Thank you very much for your help!
Maria

kumarveerapen · September 2, 2020, 4:47pm

For variant QC, similarly to the sample QC, you would use variant_qc here

To split strings, I tend to use the function R:strsplit where usage instructions are found here

danking · September 2, 2020, 4:51pm

Add a .flatten() before the .export(...) and you’ll get many columns.

Maria · September 2, 2020, 7:22pm

Thank you @danking and @kumarveerapen!

One last question, re the variant_qc. I use these commands as in the sample_qc above:
mt=hl.read_matrix_table(’/hail/chr22.mt’)
mt = hl.variant_qc(mt, name=‘variant_qc’)
mt.cols().select(‘variant_qc’).export(‘output/tablevc.tsv.bgz’)

But I get an error at the last command: LookupError: Table instance has no field ‘variant_qc’

Do you maybe know why I get this error?

Thanks again, I really appreciate your help and prompt responses.
Maria

tpoterba · September 2, 2020, 7:23pm

you’ll want .rows() for variant_qc and .cols() for sample_qc. The variants are the row axis in Hail, samples are the column axis.

Maria · September 2, 2020, 8:42pm

Thank you!!!

Topic		Replies	Views
Static plotting or dataframe extraction Hail Query & hailctl	6	1225	June 14, 2019
Save plot on google cloud Hail Query & hailctl	5	1205	June 4, 2020
Looking for a function to export a matrix table to tsv like export_samples() in hail0.1 Hail Query & hailctl	8	529	July 30, 2020
Matplotlib with hl.plot Hail Query & hailctl	3	408	January 27, 2022
Ways to speed up QC plots computation Hail Query & hailctl	1	574	September 1, 2021

Create plots in R based on the hail metrics

Related topics