Hi @danking,
One additional question:
As of now, we are considering using the pVCF that was partitioned into chromosome blocks for our Sample QC step. We are now wondering if the hl.import_vcf function can import multiple chr block pVCFs or should we merge the individual chr blocks into chromosome pVCFs (one for each chromosome) before importing? Thank you again for your advice.
Hey @mgarcia , I split this question into a separate topic to facilitate the discovery of your question by other users.
Yes, hl.import_vcf
’s path
parameter accepts either a file path, a blob storage URL, or a list of either. You may specify the files individually:
mt = hl.import_vcf(['.../chr1.vcf.bgz', '.../chr2.vcf.bgz', ...])
Or you may specify the files using a glob expression:
mt = hl.import_vcf('.../chr*.vcf.bgz')
Please also take careful note of the force_bgz
option if you have block-gzipped data with the normal gz
extension.
Hi @danking,
Thank you for your response. Just to clarify, the hl.import_vcf function can import fractioned chromosome blocks (e.g. a list of 50 pVCFs blocks that together represent chromosome 1)? The UK Biobank pVCFs have each chromosome split into several blocks. It would save us a significant amount of time if we could import all of the blocks for each chromosome into a single MT. Thank you!
Yes, this is exactly what import_vcf
is designed for.