I want to export the genotypes of an aggregated file using this code:
is really slow… about 15 minutes in a very big file with thousand of samples. What can I do to improve the perfomance?. I need the locus, the alleles and the genotypes in a file.
We’ll need the hail log and the full script you ran to fully diagnose.
Exporting to an uncompressed TSV is generally slow. You might try exporting as
filename.tsv.bgz. Also, exporting a single file requires a slow concatenation step, you might try
parallel=True instead if you can deal with many separate files of genotypes.
I don’t find the option parallel in the export of a field.
Heh. You’re right. I’ll ask someone to fix this. Are you able to use VCF files (