Table.export issue

bonsai · September 7, 2020, 1:49pm

Hi there,

I have been trying to export a table out with the following:

table.export(‘filename.tsv’)

and it is taking quite a while (an hour+ so far). Is there a reason behind the hang? it will then usually exit out without exporting the file.

the lag occurs also with: table.show()

any tips would be appreciated.

tpoterba · September 7, 2020, 3:07pm

Hail is lazy, which means that an entire pipeline (reads / filters / annotates / methods / export) will be executed when you call table.export(). It’s probably something else that’s slow, not the export. What’s the full pipeline?

bonsai · September 8, 2020, 12:31am

Is it resource heavy? I was using more nodes prior to this and never had any issues. Currently I am using only 2 cpu nodes on a cluster with at least 30G ram.

The pipeline is a simple seg_by_carrier and filtering one which I’ve used before, but changed slightly.

The most notable change was annotating using the gnomad mt instead of a custom gnomad entry:

g = hl.read_table(’/path/to/gnomad.genomes.r3.0.sites.ht’)
mt = mt.annotate_rows(gnomad=hl.struct(nfe=g[mt.row_key].freq[2],popmax=g[mt.row_key].popmax,split=g[mt.row_key].was_split,filters=g[mt.row_key].filters))

mt=mt.annotate_rows(gnomad=hl.cond(hl.is_missing(mt.gnomad.popmax.AF),mt.gnomad.annotate(popmax=mt.gnomad.popmax.annotate(AF=0)),mt.gnomad.annotate(popmax=mt.gnomad.popmax.annotate(AF=mt.gnomad.popmax.AF))))

mt=mt.annotate_rows(gnomad=hl.cond(hl.is_missing(mt.gnomad.nfe.AF),mt.gnomad.annotate(nfe=mt.gnomad.nfe.annotate(AF=0)),mt.gnomad.annotate(nfe=mt.gnomad.nfe.annotate(AF=mt.gnomad.nfe.AF))))

and then the usual:

mt=process_consequences(mt)
mt=mt.explode_rows(mt.vep.worst_csq_by_gene_canonical)
mt=mt.annotate_rows(gene=mt.vep.worst_csq_by_gene_canonical.gene_symbol)
mt=mt.annotate_rows(hgvsc=mt.vep.worst_csq_by_gene_canonical.hgvsc)
mt=mt.annotate_rows(impact=mt.vep.worst_csq_by_gene_canonical.impact)
mt=mt.annotate_rows(consequence=mt.vep.worst_csq_by_gene_canonical.most_severe_consequence)

rare_mt = mt.filter_rows(mt.gnomad.nfe.AF < 0.001, keep=True)
rare_mt = rare_mt.filter_rows(rare_mt.impact != “LOW”)
rare_mt = rare_mt.filter_rows(rare_mt.impact != “MODIFIER”)
table=rare_mt.rows().select(“gnomad”,“gene”,“hgvsc”,“impact”,“consequence”,“segregates_with_carrier”)

would this cause a lag when showing or exporting a table?

Thanks in advance

tpoterba · September 8, 2020, 11:39am

I think the slow thing here is reading/joining the gnomAD table. This is a serious chunk of data - the genomes are hundreds of GBs if I remember correctly. How big is your mt input? It must be much smaller.

Topic		Replies	Views
Table.annotate takes a while Hail Query & hailctl	6	405	March 15, 2021
Writing my table as csv or vcf or ht takes too long Hail Query & hailctl	5	64	May 4, 2025
Globals are dropped when writing Table Hail Query & hailctl	0	157	November 6, 2023
Speeding up gnomAD annotation Hail Query & hailctl	3	802	December 1, 2020
Final task/partition is hanging Hail Query & hailctl	4	545	September 26, 2019

Table.export issue

Related topics