Export VCF by chromosome

ch-kr · September 28, 2021, 6:35pm

Hi hail team!

I have a question regarding VCF export. I’m trying to export a 300K sample MatrixTable to VCF and was recently informed the VCFs should be organized per chromosome. I was previously planning to read in the MT and export to VCF using parallel=header_per_shard.

Do you have a sense of how much slower/expensive it would be to read in the MT to export, filter to a single chromosome, and export to VCF (still setting parallel to header_per_shard and repeating this for all chromosomes)?

Thanks!

tpoterba · September 28, 2021, 6:41pm

Filtering to chromosome should be totally fine. As long as this is read / filter_rows / export, you should end up only reading every row of the input MatrixTable once (for its appropriate chromosome).

ch-kr · September 28, 2021, 6:41pm

awesome, thank you!

Topic		Replies	Views
Export matrix table as vcf, group by chromosome Hail Query & hailctl	1	426	April 27, 2022
Exporting Hail MT to VCF - Missing Genotypes Hail Query & hailctl	11	251	May 8, 2024
Start and end position per partition Hail Query & hailctl	11	566	November 2, 2021
Best way of extracting individual VCFS from HAIL matrix Hail Query & hailctl	1	437	March 6, 2023
ArrayIndexOutOfBoundsException Hail Query & hailctl	22	1223	November 21, 2019

Export VCF by chromosome

Related topics