Mt from stdout in cromwell

Hey there. I am trying to use Hail in Cromwell on Google Cloud.

I am trying to pass a MatrixTable from task 1 to task 2.

At the end of task 1, I write the MatrixTable to a file named ‘qc.mt’.

task 1 prepGWAS output

2019-06-05 17:48:54 Hail: INFO: wrote matrix table with 10961 rows and 284 columns in 1 partition to qc.mt

But I run into an error when task 2 tries to call this via stdout.

task 2 runGWAS error
This path is referring to the location where Cromwell stores stdout of task 1?

Error summary: HailException: MatrixTable and Table files are directories; path '/cromwell_root/crom-buck/cromwell-execution/hailGWAS/51954f54-4ba8-4b16-956a-e2225fa717a8/call-prepGWAS/stdout' is not a directory

script from task 2
This calls the mt file from the input variable named QC that is tied to stdout of task 1.

import os
import hail as hl;

QC = os.getenv('QC')

# need to grab the .mt file
mt = hl.read_matrix_table(QC)

...

wdl file
Here QC is fed into the python script where the mt is called

workflow hailGWAS {
    call prepGWAS
    call runGWAS {input: qc_mt = prepGWAS.qc_mt}
}

task prepGWAS {
    command {
        cd ..; python prep-file-gwas2.py;
    }
    output {
        File qc_mt = stdout('qc.mt')
        // File qc_mt = stdout()
    }
    runtime {
        docker: 'hashrocketsyntax/java-py-hail:mt-named'
    }
}

task runGWAS {
    File qc_mt

    command {
        cd ..; export QC=${qc_mt}; python run-gwas3.py;
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        docker: 'hashrocketsyntax/java-py-hail:mt-named'
    }
}

MatrixTable / Table files are directories, not single files. You’ll have to zip this up into an archive in task 1 and extract it in task 2, I think.

I don’t recommend this approach, though – I don’t think it’ll really scale.

1 Like