I’m trying to import the 1000 genomes project data using the files found here but when i include the Y chromosome into my paths list I get the following error.
Hail version: 0.2.81-edeb70bc789c
Error summary: HailException: invalid sample IDs: expected same number of samples for all inputs.
file:/Users/tony/Projects/nat-variation/data/interim/block_zipped_IGSR/ALL.chr7.phase3_shapeit2_mvncall_integrated_v5_extra_anno.20130502.genotypes.vcf.bgz has 2504 ids and
file:/Users/tony/Projects/nat-variation/data/interim/block_zipped_IGSR/ALL.chrY.phase3_integrated_v2b.20130502.genotypes.vcf.bgz has 1233 ids.
What is the best way to handle this discrepancy?
Is there a way to include the Y chromosome using hail as it is or would it be best to impute empty instances into the .vcf.bgz
file for the missing ids?