java.io.FileNotFoundException (mt.summarize())

igorm · October 28, 2022, 12:48pm

Hi,

I have driver node, and few separate worker nodes. At first glimpse looks like its working fine. I can see the nodes in web ui, I can load and collect csv from s3. I also tried to run spark.parallelize and it works well across the nodes.

I am able to load MatrixTable from S3:

mt = hl.read_matrix_table(database_location)
#mt.describe()
mt.GT.take(5)
[Call(alleles=[0, 1], phased=True),
 Call(alleles=[1, 0], phased=True),
 Call(alleles=[0, 0], phased=True),
 Call(alleles=[0, 0], phased=True),
 Call(alleles=[0, 0], phased=True)]

But when I run for example mt.summarize() I get error:

mt.summarize()
Java stack trace:
Hail version: 0.2.104-1940d9e8eaab
Error summary: FileNotFoundException: File /tmp/aggregate_intermediates/-pqQ4bnM3U52mspCinC9ilT4c68fc27-1bde-4fad-926b-31c2fd467d50 does not exist

This file exists in worker /tmp directory, but not on the driver machine.

Is there a setting I am missing?

Thanks

tpoterba · October 28, 2022, 1:11pm

/tmp is probably not network-visible – you could try setting a temp dir in hl.init that points to a s3 bucket (use a retention policy so that you’re not paying for temp data indefinitely)

igorm · October 28, 2022, 1:52pm

Thanks @tpoterba for fast reply!

I created /tmp folder in the hdfs storage and pointed tmp_dir parameter to it and and now it works.
hl.init(sc=spark, tmp_dir='hdfs://192.168.1.100:8020/tmp')

Topic		Replies	Views
Empty matrix table with vcf_combiner.run_combiner Hail Query & hailctl	0	415	June 4, 2021
Doing a linear_mixed_regression_rows, why this error Hail Query & hailctl	5	582	October 12, 2019
FileNotFoundException when I tried to export after I upgraded hail Hail Query & hailctl	9	695	July 8, 2020
Using hadoop and spark to use with hail 0.2.83 Hail Batch & General Cloud	3	750	February 22, 2022
No file or directory found at gs: Hail Query & hailctl	1	571	January 31, 2023

java.io.FileNotFoundException (mt.summarize())

Related topics