Performance genotypes

I want to export the genotypes of an aggregated file using this code:
data=data.key_rows_by(variant=hl.variant_str(data_filtered.locus,data_filtered.alleles))
data.GT.export(“fileName.tsv”)

is really slow… about 15 minutes in a very big file with thousand of samples. What can I do to improve the perfomance?. I need the locus, the alleles and the genotypes in a file.

We’ll need the hail log and the full script you ran to fully diagnose.

Exporting to an uncompressed TSV is generally slow. You might try exporting as filename.tsv.bgz. Also, exporting a single file requires a slow concatenation step, you might try parallel=True instead if you can deal with many separate files of genotypes.

Thank you,

I don’t find the option parallel in the export of a field.

Heh. You’re right. I’ll ask someone to fix this. Are you able to use VCF files (export_vcf) instead?