Hail is lazy, which means that an entire pipeline (reads / filters / annotates / methods / export) will be executed when you call table.export(). It’s probably something else that’s slow, not the export. What’s the full pipeline?
Is it resource heavy? I was using more nodes prior to this and never had any issues. Currently I am using only 2 cpu nodes on a cluster with at least 30G ram.
The pipeline is a simple seg_by_carrier and filtering one which I’ve used before, but changed slightly.
The most notable change was annotating using the gnomad mt instead of a custom gnomad entry:
g = hl.read_table(’/path/to/gnomad.genomes.r3.0.sites.ht’)
mt = mt.annotate_rows(gnomad=hl.struct(nfe=g[mt.row_key].freq[2],popmax=g[mt.row_key].popmax,split=g[mt.row_key].was_split,filters=g[mt.row_key].filters))
I think the slow thing here is reading/joining the gnomAD table. This is a serious chunk of data - the genomes are hundreds of GBs if I remember correctly. How big is your mt input? It must be much smaller.