Announcing Hail 0.2!

We are thrilled to announce the formal release of Hail 0.2! The interface is now stable.

Hail 0.2 reflects over a year of work based on the experiences of our team and our users with Hail 0.1. It’s a huge step forward in terms of generality, flexibility and power.

Here are some of the major changes:

  • The Python interface was completely redesigned. It is now 100% pure Python, and the “Hail expression language” is now gone. Where possible, the Hail types and functions have been made to match the corresponding interfaces in Python. For example, the length of a Hail array expression a is now hl.len(a).
  • Hail 0.1’s KeyTable and VariantDataset are now called Table and MatrixTable, respectively. In 0.2, the VariantDataset class had a wide variety of methods like sample_qc and variant_qc. This functionality is preserved, but instead is used as hl.sample_qc(mt) and hl.variant_qc(mt).
  • While a VariantDataset was keyed by variants and samples, MatrixTables are completely generic in terms of data schema.
  • In particular, MatrixTables can be grouped by rows or columns with entries aggregated to form new MatrixTables. For example, in Hail 0.2, a gene burden test is done by simply grouping variants by gene and then applying a regression function.
  • The Variant and Genotype data types have been removed. Instead of Variant, data imported from VCF / BGEN / GEN will be keyed by a locus field (of type locus) and an alleles field (of type array<str>). Instead of Genotype, each of these formats will import entry fields appropriate for the input data: all VCF format fields, GT/dosage/GP for BGEN, etc.
  • Hail 0.2 supports reference genomes, which are tracked as part of the Hail locus type.
  • Aggregator functionality is greatly expanded, including support for grouped aggregations and multivariate, weighted linear regression.
  • We added scans, which are running aggregations. These can be used to, for instance, compute a running sum of a table field.
  • We added dense and block-sparse BlockMatrices that interoperate with NumPy matrices. These can be used to, for instance, compute genome-wide banded linkage disequilibrium.
  • We added a scalable linear mixed model.
  • We added Poisson regression.
  • We added an experimental plotting library which can handle large datasets by intelligent downsampling.

Does this mean Hail is finished? No! Here are a few of the exciting things we have planned:

  • We’re building a tool to lossless-ly import and merge gVCFs that scales linearly with samples and supports incremental sample addition. This will be essential for massive datasets that are coming down the pipe. Here’s a presentation on the prototype at GA4GH 2018.
  • Hail 0.2 already includes a simple query optimizer, but performance should improve greatly as we improve it. We’re also prototyping a new C++ code generator that has shown >3x improvement on simple pipelines.
  • We’re planning a multi-tenant always-on Hail service. No more spinning up clusters: instant analytics!
  • We plan to greatly expand our linear algebra functionality, adding both local and distributed n-dimensional arrays, integrated with the query optimizer. These primitives will in particular support machine learning algorithms for scalable analyses of (single-cell) RNAseq data.
  • We’re working on fast, approximate methods to summarize distributions, e.g. quantiles.
  • As always, your feedback will factor into development! If you have ideas or requests, we’d love to see them posted to our forum.

n-dimensional (Tensor)Tables will have to wait for 0.3.

Hail 0.2 caveats:

  • Unfortunately, Hail 0.2 has a new file format and cannot read Hail 0.1 VariantDataset and KeyTable files.
  • Pipelines from Hail 0.1 will likewise need to be rewritten for 0.2.
  • If a piece of functionality is marked as experimental, we reserve the right to modify or remove that functionality during the life of the 0.2 stable release. The plotting library is an example of experimental functionality.

Whether you’re familiar with Hail 0.1 or completely new to Hail, we recommend going through the Hail 0.2 tutorials to learn the new interface: https://hail.is/docs/devel/tutorials-landing.html. And, of course, if you have questions you can find us on the user forum (http://discuss.hail.is) or on Hail Zulip chat (http://hail.zulipchat.com).

2 Likes