Does Hail support gvcf from GATK


#1

Hi,
We could like to import the gvcf file generated by GATK into matrix table , however it seems that it could not be compatible. The matrix table have no “GT” genotype information if we import the gvcf file.
Could anyone know?


#2

Hail doesn’t support GVCFs right now, no. You’ll have to use GATK to do joint genotyping (many gvcfs => one multi-sample VCF) first in order to use Hail.

We have a new feature coming in 3-6 months that will allow Hail to do this step of importing gVCFs.


#3

Thank you.
But we are now still planing to import some loci display “ref” genotype in all samples from vcf file. My alternative method is that change the column “ALT” in vcf file to all the possible allele, eg(C A,T,G), only for snp variants type.
Could you give me some suggestions?


#4

Sorry, I don’t totally understand what you’re trying to do. Can you give an example of a line of input and what you want to get out of it?


#5

Sorry. My gvcf format is like

##contig=<ID=chrY,length=59373566>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAM1
chrY 2655001 . G <NON_REF> . . END=2655001 GT:DP:AD_REF:AD_ALT:TP 0/0:121:83:0:negtive
chrY 2655002 . T <NON_REF> . . END=2655002 GT:DP:AD_REF:AD_ALT:TP 0/0:123:85:0:negtive
chrY 2655003 . C <NON_REF> . . END=2655003 GT:DP:AD_REF:AD_ALT:TP 0/0:124:85:1:negtive
chrY 2655004 . T <NON_REF> . . END=2655004 GT:DP:AD_REF:AD_ALT:TP 0/0:125:86:0:negtive
chrY 2655005 . T <NON_REF> . . END=2655005 GT:DP:AD_REF:AD_ALT:TP 0/0:125:86:0:negtive
chrY 2655006 . T <NON_REF> . . END=2655006 GT:DP:AD_REF:AD_ALT:TP 0/0:125:86:0:negtive
chrY 2655007 . G <NON_REF> . . END=2655007 GT:DP:AD_REF:AD_ALT:TP 0/0:125:86:0:negtive
chrY 2655008 . T <NON_REF> . . END=2655008 GT:DP:AD_REF:AD_ALT:TP 0/0:125:86:0:negtive
chrY 2655009 . A <NON_REF> . . END=2655009 GT:DP:AD_REF:AD_ALT:TP 0/0:125:85:0:negtive
chrY 2655010 . G <NON_REF> . . END=2655010 GT:DP:AD_REF:AD_ALT:TP 0/0:127:87:0:negtive
chrY 2655011 . C <NON_REF> . . END=2655011 GT:DP:AD_REF:AD_ALT:TP 0/0:128:88:0:negtive

What I want to do is I still want to import the vcf into hail and my alternative method is that change column “ALT” in the vcf file like

##contig=<ID=chrY,length=59373566>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 111-1362-1716
chrY 2655001 . G A,T,C . . END=2655001 GT:DP:AD_REF:AD_ALT:TP 0/0:121:83:0:negtive
chrY 2655002 . T G,C,A . . END=2655002 GT:DP:AD_REF:AD_ALT:TP 0/0:123:85:0:negtive

I think that kind of vcf could be suitable for hail matrix table. I am not sure if there exist any other kind of proper way


#6

Hail doesn’t work especially well to process a bunch of single-sample gVCFs. Is this what you’re intending?

I’d really recommend doing joint-calling with GATK or similar tools first.


#7

Reinforcing Tim’s question, how do you intend to use Hail on this dataset? Hail is designed to work with large datasets containing many joint-called genotypes. There might be other tools better suited to your needs of processing gVCFs.