How do I use hl.import_vcf to import a VCF that has been partitioned into multiple files?

Hi @danking,

One additional question:
As of now, we are considering using the pVCF that was partitioned into chromosome blocks for our Sample QC step. We are now wondering if the hl.import_vcf function can import multiple chr block pVCFs or should we merge the individual chr blocks into chromosome pVCFs (one for each chromosome) before importing? Thank you again for your advice.

Hey @mgarcia , I split this question into a separate topic to facilitate the discovery of your question by other users.

Yes, hl.import_vcf’s path parameter accepts either a file path, a blob storage URL, or a list of either. You may specify the files individually:

mt = hl.import_vcf(['.../chr1.vcf.bgz', '.../chr2.vcf.bgz', ...])

Or you may specify the files using a glob expression:

mt = hl.import_vcf('.../chr*.vcf.bgz')

Please also take careful note of the force_bgz option if you have block-gzipped data with the normal gz extension.

Hi @danking,

Thank you for your response. Just to clarify, the hl.import_vcf function can import fractioned chromosome blocks (e.g. a list of 50 pVCFs blocks that together represent chromosome 1)? The UK Biobank pVCFs have each chromosome split into several blocks. It would save us a significant amount of time if we could import all of the blocks for each chromosome into a single MT. Thank you!

Yes, this is exactly what import_vcf is designed for.