New Python features; print_schema and show_globals removed

Lots of changes

Time to start rewriting your pipelines! We’ve rolled out a big set of changes to the Hail Python API which make it much more user-friendly and powerful. We have built a connector that translates annotations to Python, which you’ll see used in the examples in this post. We also now have first-class Python objects for genetics concepts like Variant, Genotype, and Interval. We’ll be building out new features using these constructs in the coming weeks.

No more print_schema

This was a useful function! But what’s even more useful? Being able to print and manipulate the schemata directly:

>>> vds = vds.split_multi().sample_qc().variant_qc()

>>> print(vds.sample_schema)
>>> print(vds.variant_schema)

No more annotate_global_expr_by_* methods: use query_variants and query_samples

This functionality has been replaced by a much more useable interface. We’ve added two commands, query_samples and query_variants that let you aggregate on samples and variants and see those results without going through the awkward intermediate of global annotations.

Here are some toy examples:

>>> vds = vds.split_multi().sample_qc().variant_qc()

>>> low_callrate_variants = vds.query_variants(
>>>    'variants.filter(v => va.qc.callRate < 0.90).collect()')[0]

>>> print(low_callrate_variants[:3])
[ Variant(22, 16050036, A, [AltAllele(A, C)]), 
  Variant(22, 16050115, G, [AltAllele(G, A)]), 
  Variant(22, 16050159, C, [AltAllele(C, T)])]

These results are returned as directly manipulable Python objects. Did you actually want to have them in global annotations to use later, though? We can do that too with the new method annotate_global_py.

>>> from hail.type import *
>>> vds = vds.annotate_global_py(
>>>    'global.badVariants', 
>>>    low_callrate_variants, 
>>>    TArray(TVariant()))

No more show_globals: use .globals instead

We don’t have show_globals anymore, but we can easily get the values back out:

>>> vds.globals.badVariants
>>> vds.globals.badVariants == low_callrate_variants
True

Some of the VariantDataset methods have become attributes. See the API in our Python docs. Notice that you can also get out a Python dict of sample annotations with vds.sample_annotations!

Be ready for a complete removal of the command line module in the next few days! As always, feel free to stop by to chat on Gitter if you have questions!