I have large genomic (Terabytes of VCFs) files that I wish to write to a database as Hail. When I write with hl.write
the disk space of the Hail objects is actually around 1.4x bigger than the original VCF files. Is there a more efficient way to write these tables?
Hi @cliveno1,
That is definitely surprising. Can you share any more details? If you import one of the VCFs as mt
, what is the output of mt.describe()
?