Hi,
I have driver node, and few separate worker nodes. At first glimpse looks like its working fine. I can see the nodes in web ui, I can load and collect csv from s3. I also tried to run spark.parallelize and it works well across the nodes.
I am able to load MatrixTable from S3:
mt = hl.read_matrix_table(database_location)
#mt.describe()
mt.GT.take(5)
[Call(alleles=[0, 1], phased=True),
Call(alleles=[1, 0], phased=True),
Call(alleles=[0, 0], phased=True),
Call(alleles=[0, 0], phased=True),
Call(alleles=[0, 0], phased=True)]
But when I run for example mt.summarize() I get error:
mt.summarize()
Java stack trace:
Hail version: 0.2.104-1940d9e8eaab
Error summary: FileNotFoundException: File /tmp/aggregate_intermediates/-pqQ4bnM3U52mspCinC9ilT4c68fc27-1bde-4fad-926b-31c2fd467d50 does not exist
This file exists in worker /tmp directory, but not on the driver machine.
Is there a setting I am missing?
Thanks