Hi, I’m trying to query the data at Genebass,
hl.init(local='local[2]', log=logfile, tmp_dir=tmpdir)
genebass = hl.read_matrix_table('gs://ukbb-exome-public/500k/results/results.mt')
and getting the following error:
(hail) [basic-dy-t3axlarge-1 hail]$ python3 spark-query-genes.py
2023-01-23 11:08:55 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2023-01-23 11:08:56 WARN Hail:43 - This Hail JAR was compiled for Spark 3.1.1, running with Spark 3.1.2.
Compatibility is not guaranteed.
Running on Apache Spark version 3.1.2
SparkUI available at http://basic-dy-t3axlarge-1.bioinformatics-cro-hpc-slurm.pcluster.:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.77-684f32d73643
LOGGING: writing to ~/tmp/hail//hail-filter.log
Traceback (most recent call last):
File "spark-query-genes.py", line 12, in <module>
genebass = hl.read_matrix_table('gs://ukbb-exome-public/500k/results/results.mt')
File "<decorator-gen-1344>", line 2, in read_matrix_table
File "/apps/users/user2031/mambaforge/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 577, in wrapper
return __original_func(*args_, **kwargs_)
File "/apps/users/user2031/mambaforge/envs/hail/lib/python3.7/site-packages/hail/methods/impex.py", line 2115, in read_matrix_table
for rg_config in Env.backend().load_references_from_dataset(path):
File "/apps/users/user2031/mambaforge/envs/hail/lib/python3.7/site-packages/hail/backend/spark_backend.py", line 326, in load_references_from_dataset
return json.loads(Env.hail().variant.ReferenceGenome.fromHailDataset(self.fs._jfs, path))
File "/apps/users/user2031/mambaforge/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/apps/users/user2031/mambaforge/envs/hail/lib/python3.7/site-packages/hail/backend/py4j_backend.py", line 32, in deco
'Error summary: %s' % (deepest, full, hail.__version__, deepest), error_id) from None
hail.utils.java.FatalError: HailException: No file or directory found at gs://ukbb-exome-public/500k/results/results.mt
Java stack trace:
is.hail.utils.HailException: No file or directory found at gs://ukbb-exome-public/500k/results/results.mt
at is.hail.utils.ErrorHandling.fatal(ErrorHandling.scala:11)
at is.hail.utils.ErrorHandling.fatal$(ErrorHandling.scala:11)
at is.hail.utils.package$.fatal(package.scala:78)
at is.hail.expr.ir.RelationalSpec$.readMetadata(AbstractMatrixTableSpec.scala:32)
at is.hail.expr.ir.RelationalSpec$.readReferences(AbstractMatrixTableSpec.scala:73)
at is.hail.variant.ReferenceGenome$.fromHailDataset(ReferenceGenome.scala:581)
at is.hail.variant.ReferenceGenome.fromHailDataset(ReferenceGenome.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Hail version: 0.2.77-684f32d73643
Error summary: HailException: No file or directory found at gs://ukbb-exome-public/500k/results/results.mt
However, the file is definitely there:
~ » gcloud storage ls gs://ukbb-exome-public/500k/results/results.mt
gs://ukbb-exome-public/500k/results/results.mt/
gs://ukbb-exome-public/500k/results/results.mt/README.txt
gs://ukbb-exome-public/500k/results/results.mt/_SUCCESS
gs://ukbb-exome-public/500k/results/results.mt/metadata.json.gz
gs://ukbb-exome-public/500k/results/results.mt/cols/
gs://ukbb-exome-public/500k/results/results.mt/entries/
gs://ukbb-exome-public/500k/results/results.mt/globals/
gs://ukbb-exome-public/500k/results/results.mt/index/
gs://ukbb-exome-public/500k/results/results.mt/references/
gs://ukbb-exome-public/500k/results/results.mt/rows/
This used to work and can’t see the issue now. Thanks for the help!