How to run Hail from outside of interactive python

Hi there,

I’m new to Hail and I’m looking for some help.

Currently I have hail installed locally (using the pre-built distribution found in the Getting started page - for Spark2.1.0), and I have been able to access hail, by using the file bin/hail. This opens an interactive python session.

I’m following a pipeline setup by another group, where they are accessing hail via a bash script. I have not been able to replicate their commands:
eg hail importvcf or hail annotatevariants.

In fact I can not find documentation on these command. They look similar to the HailContext.import_vcf or VariantDataset.annotate_variants_expr commands.

Was wondering if you could help point me in the right direction. Do I need to build from the hail source to allow this version of commands to work?

Thanks for your assistance in advance.
Eddie

Hi Eddie,
Hail had a command line interface (and no Python API) for the first year of development. We removed that interface about 4 months before the release of the stable 0.1 version, so any pipelines using command line Hail are using an extremely old build.

If what you’re looking for is functionality that lets you run Hail non-interactively as a submitted job, you can do that with spark-submit and a Python script:

$ cat pipeline.py
from hail import *
hc = HailContext()

hc.read('data/1kg.vds')\
  .summarize()\
  .report()
$ spark-submit pipeline.py
Running on Apache Spark version 2.0.2
SparkUI available at http://192.168.7.31:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.1-b151c7f
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[Stage 1:>                                                          (0 + 4) / 4]
         Samples: 1000
        Variants: 10961
       Call Rate: 0.983163
         Contigs: ['X', '12', '8', '19', '4', '15', '11', '9', '22', '13', '16', '5', '10', '21', '6', '1', '17', '14', '20', '2', '18', '7', '3']
   Multiallelics: 0
            SNPs: 10961
            MNPs: 0
      Insertions: 0
       Deletions: 0
 Complex Alleles: 0
    Star Alleles: 0
     Max Alleles: 2
1 Like

I’ll just add some emphasis to the stability of Hail’s 0.1 release.

For Hail 0.1, we added a python interface and removed the command line interface. We also committed to backwards compatibility and bug fixes for several months.

In the long-term, Hail will always have at least a python interface.

We will eventually release Hail 0.2. This will include backwards-incompatible changes to the python interface. These changes will come with significant performance improvements, but they will require editing old pipelines. The 0.1 release will still be available, albeit we will eventually stop committing bug fixes to the 0.1 release, likely after 0.2 has been available for a few months (giving everyone time to migrate).

1 Like

Thanks for the info guys.

Eddie

Guys,

Do you still have documentation for the initial command line interface handy. I’m trying to translate the syntax of the original command line instructions to the new Python API.

For example : annotatevariants expr -c ‘va = drop(va, info)’

I believe in the new release this is the API VariantDataset. annotate_variants_expr. va stands for variant annotations. But I cannot find what “-c” might mean, and also the drop? Is this to drop the column info from the variant annotations?

Hope you can help again.
Eddie

You can go back to a commit from ~1 year ago on the github repo to see the old docs, but it’ll still be hard to translate. Most times you see a -c though, that’s including a Hail expression, and the syntax for those hasn’t changed nearly as much. See the full documentation here: drop can be found under “functions”.

I might help to read -c as “command”.