Pretty tables on Zeppelin

Hi,

I manage to configure Hail to run on a AWS Cluster, with Zeppelin and pyspark kernel.

I am able to plot graph using Hail.plot methods
but the show method still output the “text-based” table.
I wonder how is it possible to output pretty table like the one seen on Hail doc “bootstrap-style” table

here an exemple of my config

# Import bokeh
from bokeh.io import show, output_notebook
from bokeh.models.mappers import CategoricalColorMapper, LinearColorMapper
from bokeh.palettes import RdYlBu
# Import bokeh-zeppelin
import bkzep
# Activate output notebook
output_notebook(notebook_type='zeppelin')
# Import hail
import hail as hl
hl.init(sc)
# Load mt
mt = hl.read_matrix_table("s3://bucket/data.vcf.mt")
# Sample QC
mt = hl.sample_qc(mt)
# Add column index
mt = mt.add_col_index()
# Check for n_snp outliers
p = hl.plot.scatter(
    mt.col_idx,
    mt.sample_qc.n_snp,
    label={'count': mt.sample_qc.n_snp},
    title='Number of SNPs by sample',
    xlabel='Sample',
    ylabel='Number of SNPs',
    size=10,
    legend=False,
    hover_fields={'sample': mt.s},
    colors={'count': LinearColorMapper(palette=['#1f77b4'])},
    width=800,
    height=400
    )
show(p)
# Here I get a nice interactive scatter plot
#
# Check values 
# mt.sample_qc.n_snp.show(2)
# +-----------+---------+
# | s         |  <expr> |
# +-----------+---------+
# | str       |   int64 |
# +-----------+---------+
# | "WHB3854" | 5192969 |
# | "WHB3855" | 5217428 |
# +-----------+---------+
# Here show a text-based table

How can I output a pretty bootstrap-based table with show() ?

show() hooks into the Jupyter rich display system. You can pass an argument called handler which should be a one parameter function that receives a Hail _Show object. This has an _repr_html_ method that generates an HTML table. If that doesn’t work with Zeppelin, you might want take, which show uses, and then you can use the array of structs to construct a Zeppelin table.

hmm if I use take, I get an array of value of the field

mt.DP.take(2)
# [38, 38]

show() print a full table of values

mt.DP.show()
# +---------------+------------+-----------+-----------+
# | locus         | alleles    | SSM001.DP | SSM002.DP |
# +---------------+------------+-----------+-----------+
# | locus<GRCh38> | array<str> |     int32 |     int32 |
# +---------------+------------+-----------+-----------+
# | chr1:2087302  | ["C","T"]  |        38 |        38 |
# | chr1:4645854  | ["G","C"]  |        46 |        37 |
# +---------------+------------+-----------+-----------+

Is there a way to somehow take the table generate by show() but instead of print it to text, pass the data structure to another function ?

Grr. Sorry, it looks like take uses the old-style behavior that treats entry fields as 1 dimensional.

Something like this should solve your problem:

In [28]: mt = hl.balding_nichols_model(3,10,10)                                                                                                                                                
2020-01-10 09:49:34 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 10 variants...

In [29]: n_rows = 2 
    ...: n_cols = 2 
    ...: x = mt.GT.take(n_rows * mt.count_cols()) 
    ...: entries = [] 
    ...: for i, x in enumerate(x): 
    ...:     if i % 10 == 0: 
    ...:         entries.append([]) 
    ...:     if i % 10 < n_cols: 
    ...:         entries[-1].append(x) 
    ...: keys = mt.row_key.take(2) 
    ...: result = [hl.Struct(**key, entries=entries) for key, entries in zip(keys, entries)]                                                                                                   
2020-01-10 09:49:36 Hail: INFO: Coerced sorted dataset
2020-01-10 09:49:36 Hail: INFO: Coerced sorted dataset

In [30]: print('\n'.join(str(x) for x in result))                                                                                                                                              
Struct(locus=Locus(contig=1, position=1, reference_genome=GRCh37), alleles=['A', 'C'], entries=[Call(alleles=[0, 0], phased=False), Call(alleles=[0, 0], phased=False)])
Struct(locus=Locus(contig=1, position=2, reference_genome=GRCh37), alleles=['A', 'C'], entries=[Call(alleles=[1, 1], phased=False), Call(alleles=[1, 1], phased=False)])