Table.export issue

Hi there,

I have been trying to export a table out with the following:

table.export(‘filename.tsv’)

and it is taking quite a while (an hour+ so far). Is there a reason behind the hang? it will then usually exit out without exporting the file.

the lag occurs also with: table.show()

any tips would be appreciated.

1 Like

Hail is lazy, which means that an entire pipeline (reads / filters / annotates / methods / export) will be executed when you call table.export(). It’s probably something else that’s slow, not the export. What’s the full pipeline?

Is it resource heavy? I was using more nodes prior to this and never had any issues. Currently I am using only 2 cpu nodes on a cluster with at least 30G ram.

The pipeline is a simple seg_by_carrier and filtering one which I’ve used before, but changed slightly.

The most notable change was annotating using the gnomad mt instead of a custom gnomad entry:

g = hl.read_table(’/path/to/gnomad.genomes.r3.0.sites.ht’)
mt = mt.annotate_rows(gnomad=hl.struct(nfe=g[mt.row_key].freq[2],popmax=g[mt.row_key].popmax,split=g[mt.row_key].was_split,filters=g[mt.row_key].filters))

mt=mt.annotate_rows(gnomad=hl.cond(hl.is_missing(mt.gnomad.popmax.AF),mt.gnomad.annotate(popmax=mt.gnomad.popmax.annotate(AF=0)),mt.gnomad.annotate(popmax=mt.gnomad.popmax.annotate(AF=mt.gnomad.popmax.AF))))

mt=mt.annotate_rows(gnomad=hl.cond(hl.is_missing(mt.gnomad.nfe.AF),mt.gnomad.annotate(nfe=mt.gnomad.nfe.annotate(AF=0)),mt.gnomad.annotate(nfe=mt.gnomad.nfe.annotate(AF=mt.gnomad.nfe.AF))))

and then the usual:

mt=process_consequences(mt)
mt=mt.explode_rows(mt.vep.worst_csq_by_gene_canonical)
mt=mt.annotate_rows(gene=mt.vep.worst_csq_by_gene_canonical.gene_symbol)
mt=mt.annotate_rows(hgvsc=mt.vep.worst_csq_by_gene_canonical.hgvsc)
mt=mt.annotate_rows(impact=mt.vep.worst_csq_by_gene_canonical.impact)
mt=mt.annotate_rows(consequence=mt.vep.worst_csq_by_gene_canonical.most_severe_consequence)

rare_mt = mt.filter_rows(mt.gnomad.nfe.AF < 0.001, keep=True)
rare_mt = rare_mt.filter_rows(rare_mt.impact != “LOW”)
rare_mt = rare_mt.filter_rows(rare_mt.impact != “MODIFIER”)
table=rare_mt.rows().select(“gnomad”,“gene”,“hgvsc”,“impact”,“consequence”,“segregates_with_carrier”)

would this cause a lag when showing or exporting a table?

Thanks in advance

I think the slow thing here is reading/joining the gnomAD table. This is a serious chunk of data - the genomes are hundreds of GBs if I remember correctly. How big is your mt input? It must be much smaller.