How do I use hl.import_vcf to import a VCF that has been partitioned into multiple files?

mgarcia · July 11, 2023, 9:47pm

One additional question:
As of now, we are considering using the pVCF that was partitioned into chromosome blocks for our Sample QC step. We are now wondering if the hl.import_vcf function can import multiple chr block pVCFs or should we merge the individual chr blocks into chromosome pVCFs (one for each chromosome) before importing? Thank you again for your advice.

danking · July 12, 2023, 12:03pm

Hey @mgarcia , I split this question into a separate topic to facilitate the discovery of your question by other users.

Yes, hl.import_vcf’s path parameter accepts either a file path, a blob storage URL, or a list of either. You may specify the files individually:

mt = hl.import_vcf(['.../chr1.vcf.bgz', '.../chr2.vcf.bgz', ...])

Or you may specify the files using a glob expression:

mt = hl.import_vcf('.../chr*.vcf.bgz')

Please also take careful note of the force_bgz option if you have block-gzipped data with the normal gz extension.

mgarcia · July 12, 2023, 12:27pm

Hi @danking,

Thank you for your response. Just to clarify, the hl.import_vcf function can import fractioned chromosome blocks (e.g. a list of 50 pVCFs blocks that together represent chromosome 1)? The UK Biobank pVCFs have each chromosome split into several blocks. It would save us a significant amount of time if we could import all of the blocks for each chromosome into a single MT. Thank you!

danking · July 12, 2023, 12:28pm

Yes, this is exactly what import_vcf is designed for.

Topic		Replies	Views
Help with import vcf and write Hail Query & hailctl	4	489	August 18, 2020
Importing multiple VCF files without header Hail Query & hailctl	0	334	June 13, 2023
Problem with hl.import_vcf from google bucket Hail Query & hailctl	5	1030	January 10, 2019
Unable to import variants from structural variant VCF Help [0.1]	5	828	January 31, 2018
Invalid genotype signature error on LoadVCF Hail Query & hailctl	5	556	May 15, 2019

How do I use hl.import_vcf to import a VCF that has been partitioned into multiple files?

Related topics