Hi,
I manage to configure Hail to run on a AWS Cluster, with Zeppelin and pyspark kernel.
I am able to plot graph using Hail.plot methods
but the show method still output the “text-based” table.
I wonder how is it possible to output pretty table like the one seen on Hail doc “bootstrap-style” table
here an exemple of my config
# Import bokeh
from bokeh.io import show, output_notebook
from bokeh.models.mappers import CategoricalColorMapper, LinearColorMapper
from bokeh.palettes import RdYlBu
# Import bokeh-zeppelin
import bkzep
# Activate output notebook
output_notebook(notebook_type='zeppelin')
# Import hail
import hail as hl
hl.init(sc)
# Load mt
mt = hl.read_matrix_table("s3://bucket/data.vcf.mt")
# Sample QC
mt = hl.sample_qc(mt)
# Add column index
mt = mt.add_col_index()
# Check for n_snp outliers
p = hl.plot.scatter(
mt.col_idx,
mt.sample_qc.n_snp,
label={'count': mt.sample_qc.n_snp},
title='Number of SNPs by sample',
xlabel='Sample',
ylabel='Number of SNPs',
size=10,
legend=False,
hover_fields={'sample': mt.s},
colors={'count': LinearColorMapper(palette=['#1f77b4'])},
width=800,
height=400
)
show(p)
# Here I get a nice interactive scatter plot
#
# Check values
# mt.sample_qc.n_snp.show(2)
# +-----------+---------+
# | s | <expr> |
# +-----------+---------+
# | str | int64 |
# +-----------+---------+
# | "WHB3854" | 5192969 |
# | "WHB3855" | 5217428 |
# +-----------+---------+
# Here show a text-based table
How can I output a pretty bootstrap-based table with show() ?
show()
hooks into the Jupyter rich display system. You can pass an argument called handler
which should be a one parameter function that receives a Hail _Show
object. This has an _repr_html_
method that generates an HTML table. If that doesn’t work with Zeppelin, you might want take
, which show
uses, and then you can use the array of structs to construct a Zeppelin table.
hmm if I use take
, I get an array of value of the field
mt.DP.take(2)
# [38, 38]
show()
print a full table of values
mt.DP.show()
# +---------------+------------+-----------+-----------+
# | locus | alleles | SSM001.DP | SSM002.DP |
# +---------------+------------+-----------+-----------+
# | locus<GRCh38> | array<str> | int32 | int32 |
# +---------------+------------+-----------+-----------+
# | chr1:2087302 | ["C","T"] | 38 | 38 |
# | chr1:4645854 | ["G","C"] | 46 | 37 |
# +---------------+------------+-----------+-----------+
Is there a way to somehow take the table generate by show()
but instead of print it to text, pass the data structure to another function ?
Grr. Sorry, it looks like take
uses the old-style behavior that treats entry fields as 1 dimensional.
Something like this should solve your problem:
In [28]: mt = hl.balding_nichols_model(3,10,10)
2020-01-10 09:49:34 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 10 variants...
In [29]: n_rows = 2
...: n_cols = 2
...: x = mt.GT.take(n_rows * mt.count_cols())
...: entries = []
...: for i, x in enumerate(x):
...: if i % 10 == 0:
...: entries.append([])
...: if i % 10 < n_cols:
...: entries[-1].append(x)
...: keys = mt.row_key.take(2)
...: result = [hl.Struct(**key, entries=entries) for key, entries in zip(keys, entries)]
2020-01-10 09:49:36 Hail: INFO: Coerced sorted dataset
2020-01-10 09:49:36 Hail: INFO: Coerced sorted dataset
In [30]: print('\n'.join(str(x) for x in result))
Struct(locus=Locus(contig=1, position=1, reference_genome=GRCh37), alleles=['A', 'C'], entries=[Call(alleles=[0, 0], phased=False), Call(alleles=[0, 0], phased=False)])
Struct(locus=Locus(contig=1, position=2, reference_genome=GRCh37), alleles=['A', 'C'], entries=[Call(alleles=[1, 1], phased=False), Call(alleles=[1, 1], phased=False)])