Static plotting or dataframe extraction

jakewendt · June 12, 2019, 6:43pm

I have just started investigating the use of Hail for GWAS analysis. I created a jupyter notebook of your GWAS demo and everything seemed to go well, locally. However, when I saved this notebook to my github repository, I noticed that the plots did not render. I see that this has to do with Bokeh’s use of javascript to load the images and Github’s disabling of javascript functionality. I’ve seen some workarounds to this that suggest loading INLINE, but have not been able to successfully implement them. In addition, I see no way mentioned to programmatically save/export these plots as png when hail is used outside of a notebook. I do see a save icon, along with others, next to each plot in the notebook. I also see that bokeh has some ways to save or export plots as pngs, but they are all failing in their own way.

The interactiveness of the bokeh plots is nice and I’d like to continue using Github to share my notebooks but I’d like to have access to the raw plots as images to include in other documents and potential publications.

Is there a way to render and/or save these plots that I’m not seeing?
Can hail have the option to use matplotlib to plot instead of bokeh?
Can these data be extracted from the hail objects and into a dataframe or similar, then plotted with matplotlib?

I’m guessing that the latter is likely the simplest, but I still haven’t figured it out.

Thanks in advance,
Jake

tpoterba · June 12, 2019, 7:01pm

Is there a way to render and/or save these plots that I’m not seeing?

These bokeh functions should work to export plots, I think. How are these failing?

I’m not totally sure how GitHub’s renderer works – the GWAS tutorial in our documentation is generated using nbconvert to export HTML, which seems to work fine.

Can hail have the option to use matplotlib to plot instead of bokeh?

We do intend to build multiple plotting backends, but it’s not a high priority right now. It feels like we really need to build our own plotting dialect which the Hail plotting functionality rests on, then build backends for that in bokeh, plotly, matplotlib, etc.

Can these data be extracted from the hail objects and into a dataframe or similar, then plotted with matplotlib?

That’s probably the first step to building support for multiple backends. We generally do this internally:

github.com

hail-is/hail/blob/b874f6ad163ee9a8a2056707363f5d4aa08679f3/hail/python/hail/plot/plots.py#L920


      
          hover_fields = {} if hover_fields is None else hover_fields
          label = {} if label is None else {'label': label} if isinstance(label, Expression) else label
          colors = {'label': colors} if isinstance(colors, ColorMapper) else colors
          label_cols = list(label.keys())
          if isinstance(x, NumericExpression):
              x = ('x', x)
          
          if isinstance(y, NumericExpression):
              y = ('y', y)
          
          source_pd = _collect_scatter_plot_data(x, y, fields={**hover_fields, **label}, n_divisions=None if collect_all else n_divisions, missing_label=missing_label)
          sp = figure(title=title, x_axis_label=xlabel, y_axis_label=ylabel, height=height, width=width)
          sp, sp_legend_items, sp_legend, sp_color_bar, sp_color_mappers, sp_scatter_renderers = _get_scatter_plot_elements(sp, source_pd, x[0], y[0], label_cols, colors, size)
          
          if not legend:
              sp_legend.visible = False
              sp_color_bar.visible = False
          
          # If multiple labels, create JS call back selector
          if len(label_cols) > 1:
              callback_args=dict(

Again, not a super high priority to address right now, but pull requests are welcome!

jakewendt · June 12, 2019, 9:29pm

Thank you for the quick response. It is my understanding the github escapes all javascript from the file prior to rendering so the bokeh javascript never runs so the images never load. Plots that I’ve previously generated with matplotlib are included inline as pngs with the notebook so no javascript is required to render them. Of course, I don’t believe that they are interactive as are the bokeh plots.

“Failing” is not really a fair assessment.

Let’s say

p = hl.plot.manhattan(gwas.p_value)

from the tutorial.

The following is interesting. Producing an html page complete with all javascript and images encoded in the html text. Not terribly helpful for me.

import bokeh.plotting
bokeh.plotting.output_file(“all_of_my_plots.HTML”)
bokeh.plotting.save( p )

Both of the following styles of image access fail with respect to phantomjs.

from bokeh.io.export import get_screenshot_as_png
from selenium import webdriver
image = get_screenshot_as_png(p, height=100, width=300, driver=webdriver)

yields …

AttributeError: module ‘selenium.webdriver’ has no attribute ‘get’

The above is likely my fault as I need to better understand what to use as “webdriver”

And finally, after …

pip install --upgrade --user pillow selenium
sudo port install npm6
sudo npm install -g phantomjs-prebuilt --ignore-scripts

from bokeh.io import export_png
export_png(p, filename=“manhattan_plot.png”)

it fails with …

RuntimeError: Error encountered in PhantomJS detection: ‘internal/validators.js:125\n throw new ERR_INVALID_ARG_TYPE(name, 'string', value);\n ^\n\nTypeError [ERR_INVALID_ARG_TYPE]: The “file” argument must be of type string. Received type object\n at validateString (internal/validators.js:125:11)\n at normalizeSpawnArguments (child_process.js:411:3)\n at spawn (child_process.js:545:16)\n at Object. (/opt/local/lib/node_modules/phantomjs-prebuilt/bin/phantomjs:22:10)\n at Module._compile (internal/modules/cjs/loader.js:776:30)\n at Object.Module._extensions…js (internal/modules/cjs/loader.js:787:10)\n at Module.load (internal/modules/cjs/loader.js:653:32)\n at tryModuleLoad (internal/modules/cjs/loader.js:593:12)\n at Function.Module._load (internal/modules/cjs/loader.js:585:3)\n at Function.Module.runMain (internal/modules/cjs/loader.js:829:12)\n’

Again, “fail” isn’t really accurate. This is more of a lack of understanding in the usage at this point.

Eventually, I’ll put down my hammer and figure it out.

Thanks again,
Jake

tpoterba · June 12, 2019, 9:44pm

I think part of the problem is that we fix our bokeh dependency to an ancient version – Hail 0.2.15 (to be released in the next day or two) updates to the latest.

tpoterba · June 12, 2019, 9:45pm

For reference, I also now remember hitting those same issues with bokeh, though things are working now.

jakewendt · June 13, 2019, 10:37pm

Understood. Thank you. I’ll wait until this release before I dig any deeper.

tpoterba · June 14, 2019, 8:19pm

(it’s out)

Topic		Replies	Views
Save plot on google cloud Hail Query & hailctl	5	1205	June 4, 2020
Create plots in R based on the hail metrics Hail Query & hailctl	9	403	September 2, 2020
Show() does not return anything Hail Query & hailctl	5	616	November 13, 2018
Bokeh not loading on pyspark kernel Hail Query & hailctl	18	2206	November 25, 2021
Matplotlib with hl.plot Hail Query & hailctl	3	408	January 27, 2022

Static plotting or dataframe extraction

Related topics