VEP support for Asia Pacific / Sydney Region?

Hi,

Received this message when trying to start a dataproc cluster with VEP support -

RuntimeError: The --vep argument is not currently provided in your region. Please contact the Hail team on https://discuss.hail.is for support

What would it take to set it up? Unfortunately we cannot move our data to one of the supported regions.

Cheers,

Simon

Won’t take much to set up I think, I’ll discuss with team on Monday. We just moved the VEP data recently to this region based system because we got a large data egress bill from some EU users pulling data from the US. I only replicated the data in regions I knew we had hail/vep/gcp users. So you’re in australia-southeast-1?

Yep - australia-southeast-1 would be perfect.

Completely understand about the egress bill. And I’m sure in the end performnce will never be good with people pulling data across regions (including myself).

Thanks for the followup and let me know if there is anything I can do to help (testing, etc).

Cheers,

Simon

There is a pull request fixing this for you now: https://github.com/hail-is/hail/pull/8340. Data is already replicated for Australia. So this will work in 0.2.35 when it releases, and once this PR goes in, you could install hail from source if needed.

Fantastic! Thanks for doing this so quickly. Will see if I can give the PR a try.

Cheers,

Simon

PR is merged, so master should be fine. If it would be helpful / if installing from source proves difficult, I can send you a python wheel to pip install that contains the right change. Not sure when we expect the release of 0.2.35.

Hail 0.2.35 has been released, which contains hailctl support for Australian Vep

Hmm, only just getting to try this out (apologies!), but even though I’ve installed 0.2.35 I still seem to be getting the error - not sure if there’s something config wise I need to do to make it work?

$ pip install hail==0.2.35
Collecting hail==0.2.35
....
Installing collected packages: hail
Successfully installed hail-0.2.35
$ gcloud config set compute/region australia-southeast-1
$ hailctl dataproc start     --pkgs luigi,google-api-python-client     --vep GRCh38     --max-idle 30m     --num-workers 2     --num-preemptible-workers 12     seqr-loading-cluster 
(unset)
Traceback (most recent call last):
  File "/usr/local/bin/hailctl", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 97, in main
    cli.main(args)
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 107, in main
    jmp[args.module].main(args, pass_through_args)
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 260, in main
    raise RuntimeError("The --vep argument is not currently provided in your region. Please contact the Hail team on https://discuss.hail.is for support.")
RuntimeError: The --vep argument is not currently provided in your region. Please contact the Hail team on https://discuss.hail.is for support.

Not sure if I’ve done something wrong here?

looks like there’s no dash in australia-southeast1. This is a bad error message, though, will fix.

https://cloud.google.com/compute/docs/regions-zones

Thanks!

I actually figured out also that compute/region is the wrong config anyway - I had to set:

gcloud config set dataproc/region australia-southeast1

and now it works - thanks!