Lots of changes
Time to start rewriting your pipelines! We’ve rolled out a big set of changes to the Hail Python API which make it much more user-friendly and powerful. We have built a connector that translates annotations to Python, which you’ll see used in the examples in this post. We also now have first-class Python objects for genetics concepts like Variant, Genotype, and Interval. We’ll be building out new features using these constructs in the coming weeks.
No more print_schema
This was a useful function! But what’s even more useful? Being able to print and manipulate the schemata directly:
>>> vds = vds.split_multi().sample_qc().variant_qc()
>>> print(vds.sample_schema)
>>> print(vds.variant_schema)
No more annotate_global_expr_by_* methods: use query_variants and query_samples
This functionality has been replaced by a much more useable interface. We’ve added two commands, query_samples
and query_variants
that let you aggregate on samples and variants and see those results without going through the awkward intermediate of global annotations.
Here are some toy examples:
>>> vds = vds.split_multi().sample_qc().variant_qc()
>>> low_callrate_variants = vds.query_variants(
>>> 'variants.filter(v => va.qc.callRate < 0.90).collect()')[0]
>>> print(low_callrate_variants[:3])
[ Variant(22, 16050036, A, [AltAllele(A, C)]),
Variant(22, 16050115, G, [AltAllele(G, A)]),
Variant(22, 16050159, C, [AltAllele(C, T)])]
These results are returned as directly manipulable Python objects. Did you actually want to have them in global annotations to use later, though? We can do that too with the new method annotate_global_py
.
>>> from hail.type import *
>>> vds = vds.annotate_global_py(
>>> 'global.badVariants',
>>> low_callrate_variants,
>>> TArray(TVariant()))
No more show_globals: use .globals instead
We don’t have show_globals anymore, but we can easily get the values back out:
>>> vds.globals.badVariants
>>> vds.globals.badVariants == low_callrate_variants
True
Some of the VariantDataset
methods have become attributes. See the API in our Python docs. Notice that you can also get out a Python dict of sample annotations with vds.sample_annotations
!