Hi Hail Community,
I am new to using the Hail Py libraries. I am trying to use the filter_rows parameter to generate a filter .gvcf file using a list of CHROM/POS IDs from the BED file. I am getting some errors associated to the “chr” that is normally put in front of the #CHROM in the gvcf file.
My desired output should be a .gvcf file without the rows we have filtered from the BED file (list of CHROM and POS IDs).
I am using the following function:
import hail as hl
hl.init()
vds=hl.import_vcf(‘WGS-800-20-hard-filtered.vcf.bgz’,reference_genome=‘GRCh38’)
bed = hl.import_bed(‘with-chr-prefix.bed’,reference_genome=‘GRCh38’)
filtered_variants = vds.filter_rows(hl.is_defined(bed[vds.locus]))
hl.export_vcf(filtered_variants, ‘output/example.vcf.bgz’)
I have tried amending my BED file to have the “chr” and not have the “chr”
- Sample of a row from the BED file with the “chr”:
chr1 201060815 201060816
chr1 201091993 201091994
In this case, I get the following error:
HailException: Invalid interval ‘[chr22:94761901-chr22:94761902)’ found. Start ‘chr22:94761901’ is not within the range [1-50818468] for reference genome ‘GRCh38’
- Sample of a row from the BED file without the “chr”:
1 201060815 201060816
1 201091993 201091994
In this case, I get the following error:
Invalid interval ‘[1:201060816-1:201060817)’ found. Contig ‘1’ is not in the reference genome ‘GRCh38’.
What am I doing wrong here?
Here is also a row of data from the .gvcf file produced from an illumina instrument:
chr1 896798 rs13302934 A G 139.97 PASS AC=2;AF=1.000;AN=2;DP=32;FS=0.000;MQ=101.33;QD=4.37;SOR=0.693;FractionInformativeReads=1.000;DB GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 1/1:0,32:1.000:32:0,17:0,15:93:178,96,0:1.3997e+02,9.3315e+01,2.0242e-09:0.00,34.77,37.77:0,0,16,16:0,0,13,19