Received this message when trying to start a dataproc cluster with VEP support -
RuntimeError: The --vep argument is not currently provided in your region. Please contact the Hail team on https://discuss.hail.is for support
What would it take to set it up? Unfortunately we cannot move our data to one of the supported regions.
Won’t take much to set up I think, I’ll discuss with team on Monday. We just moved the VEP data recently to this region based system because we got a large data egress bill from some EU users pulling data from the US. I only replicated the data in regions I knew we had hail/vep/gcp users. So you’re in australia-southeast-1?
Yep - australia-southeast-1 would be perfect.
Completely understand about the egress bill. And I’m sure in the end performnce will never be good with people pulling data across regions (including myself).
Thanks for the followup and let me know if there is anything I can do to help (testing, etc).
There is a pull request fixing this for you now: https://github.com/hail-is/hail/pull/8340. Data is already replicated for Australia. So this will work in 0.2.35 when it releases, and once this PR goes in, you could install hail from source if needed.
Fantastic! Thanks for doing this so quickly. Will see if I can give the PR a try.
PR is merged, so master should be fine. If it would be helpful / if installing from source proves difficult, I can send you a python wheel to pip install that contains the right change. Not sure when we expect the release of 0.2.35.
Hail 0.2.35 has been released, which contains hailctl support for Australian Vep
Hmm, only just getting to try this out (apologies!), but even though I’ve installed 0.2.35 I still seem to be getting the error - not sure if there’s something config wise I need to do to make it work?
$ pip install hail==0.2.35
Installing collected packages: hail
Successfully installed hail-0.2.35
$ gcloud config set compute/region australia-southeast-1
$ hailctl dataproc start --pkgs luigi,google-api-python-client --vep GRCh38 --max-idle 30m --num-workers 2 --num-preemptible-workers 12 seqr-loading-cluster
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in <module>
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 97, in main
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 107, in main
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 260, in main
raise RuntimeError("The --vep argument is not currently provided in your region. Please contact the Hail team on https://discuss.hail.is for support.")
RuntimeError: The --vep argument is not currently provided in your region. Please contact the Hail team on https://discuss.hail.is for support.
Not sure if I’ve done something wrong here?
looks like there’s no dash in
australia-southeast1. This is a bad error message, though, will fix.
I actually figured out also that compute/region is the wrong config anyway - I had to set:
gcloud config set dataproc/region australia-southeast1
and now it works - thanks!