Error in load_dataset()

Hi Hail team,

I’ve encounter a issue using the following line to load the dataset in hail

mt = hl.experimental.load_dataset(name=‘dbSNP_rsid’,
reference_genome=‘GRCh38’,
version=“154”,
region=‘us’,
cloud=‘gcp’)

The error message is :
Hail version: 0.2.126-ee77707f4fab
Error summary: HailException: No file or directory found at gs://hail-datasets-us/dbSNP/build_154/GRCh38/full_table.ht

I can load other hail dataset without any issue, such as the “gnomad_hgdp_1kg_subset_dense” data.

Could you please help with some suggestions regarding this issue? Thanks!

Best,
Wen

Hi @Wen_He, please see this post. Unfortunately due to Google pricing changes we had to move our datasets to regional buckets. Upgrading to 0.2.128 should fix the issue.

Thanks for the prompt response, @danielgoldstein Dan!

I’m currently working within the AOU workspace environment and encountered restrictions preventing me from upgrading HAIL to version 0.2.128 on my end. Do you have any suggestions for alternative methods to access the dataset? Your input would be greatly appreciated. Many thanks!!

Hi @Wen_He, as a temporary measure, we can fix the URL that you’re trying to access and read it directly through hail’s normal methods. The only change that we made is moving the hail-datasets-us bucket to hail-datasets-us-central1. Can you try the following?

ht = hl.read_table('gs://hail-datasets-us-central1/dbSNP/build_154/GRCh38/full_table.ht')

I’ll add that since this is a regional bucket, it is imperative that you create your cluster in us-central1 to avoid expensive egress charges.

DISCLAIMER FOR FUTURE READERS: The google storage URIs above are not part of the officially supported hail API. Upgrading your hail version is still the recommended approach if possible.