Export VCF by chromosome

Hi hail team!

I have a question regarding VCF export. I’m trying to export a 300K sample MatrixTable to VCF and was recently informed the VCFs should be organized per chromosome. I was previously planning to read in the MT and export to VCF using parallel=header_per_shard.

Do you have a sense of how much slower/expensive it would be to read in the MT to export, filter to a single chromosome, and export to VCF (still setting parallel to header_per_shard and repeating this for all chromosomes)?

Thanks!

Filtering to chromosome should be totally fine. As long as this is read / filter_rows / export, you should end up only reading every row of the input MatrixTable once (for its appropriate chromosome).

1 Like

awesome, thank you!