I have driver node, and few separate worker nodes. At first glimpse looks like its working fine. I can see the nodes in web ui, I can load and collect csv from s3. I also tried to run spark.parallelize and it works well across the nodes.
I am able to load MatrixTable from S3:
mt = hl.read_matrix_table(database_location) #mt.describe() mt.GT.take(5) [Call(alleles=[0, 1], phased=True), Call(alleles=[1, 0], phased=True), Call(alleles=[0, 0], phased=True), Call(alleles=[0, 0], phased=True), Call(alleles=[0, 0], phased=True)]
But when I run for example mt.summarize() I get error:
mt.summarize() Java stack trace: Hail version: 0.2.104-1940d9e8eaab Error summary: FileNotFoundException: File /tmp/aggregate_intermediates/-pqQ4bnM3U52mspCinC9ilT4c68fc27-1bde-4fad-926b-31c2fd467d50 does not exist
This file exists in worker /tmp directory, but not on the driver machine.
Is there a setting I am missing?