Import multiple .vcf.bgz by chromosome fails on chr Y

Teculos · April 18, 2022, 9:53pm

I’m trying to import the 1000 genomes project data using the files found here but when i include the Y chromosome into my paths list I get the following error.

Hail version: 0.2.81-edeb70bc789c
Error summary: HailException: invalid sample IDs: expected same number of samples for all inputs.
 file:/Users/tony/Projects/nat-variation/data/interim/block_zipped_IGSR/ALL.chr7.phase3_shapeit2_mvncall_integrated_v5_extra_anno.20130502.genotypes.vcf.bgz has 2504 ids and
 file:/Users/tony/Projects/nat-variation/data/interim/block_zipped_IGSR/ALL.chrY.phase3_integrated_v2b.20130502.genotypes.vcf.bgz has 1233 ids.

What is the best way to handle this discrepancy?

Is there a way to include the Y chromosome using hail as it is or would it be best to impute empty instances into the .vcf.bgz file for the missing ids?

tpoterba · April 19, 2022, 1:10pm

This is an uncommon representation – typically we’ve seen loci on the Y chromosome include individuals of with sex karyotypes, and those without a Y chromosome have uncalled genotypes on that chromosome.

This appears to be an imputed genotype data release from 1kg, which we’re not familiar with (we’re familiar with the whole genome sequenced VCFs).

Teculos · April 19, 2022, 6:22pm

aaaah I see, is there a way to impute uncalled genotypes in the Y chromosome from hail?

Topic		Replies	Views
Importing XY psuedoautosomal data into hail Hail Query & hailctl	2	656	November 20, 2018
Importing X chromosome bgen in Hail Help [0.1]	6	1118	November 9, 2017
Importing many sample-specific VCFs Hail Query & hailctl	12	1208	December 12, 2022
Invalid genotype signature error on LoadVCF Hail Query & hailctl	5	554	May 15, 2019
Loading genotypes error Hail Query & hailctl	2	303	April 14, 2022

Import multiple .vcf.bgz by chromosome fails on chr Y

Related topics