Hi all.
I’m using DRAGEN to analyze WGS data of about 2,300 probands and their families.
Because of the scalability, we conducted joint analysis using iterative gVCF genotyper (IGG) pipeline and I got msVCF as joint output.
I tried to use Hail scripts written to analyze other pVCFs such as GATK or something else, but msVCF format is something different to which Hail can handle including vds.
msVCF looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3 sample4
chr22 10514964 . A T 21.67 PASS AC=7;AN=4148;NS=2317;NS_GT=2074;NS_NOGT=63;NS_NODATA=180;IC=0.28;HWE=0.0051;ExcHet=1;HWEc2=0 GT:GQ:AD:FT:LPL:LAA 0/0:3:1:LowDepth:0:. 0/0:24:24:PASS:0:. 0/0:23:16:PASS:0:. 0/0:23:23:PASS:0:.
chr22 10514994 . G A 35.86 PASS AC=1065;AN=3820;NS=2317;NS_GT=1910;NS_NOGT=236;NS_NODATA=171;IC=0.33;HWE=5.6e-45;ExcHet=1;HWEc2=0 GT:GQ:AD:FT:LPL:LAA ./.:0:0:LowDepth;LowGQ:0:. 0/0:0:19:LowGQ:0:. 0/1:3:8,3:PASS:35,0,15:1 0/0:23:23:PASS:0:.
chr22 10515037 . AAAT A 6.98 PASS AC=8;AN=3200;NS=2317;NS_GT=1600;NS_NOGT=569;NS_NODATA=148;IC=0.25;HWE=0.0087;ExcHet=1;HWEc2=0 GT:GQ:AD:FT:LPL:LAA 0/0:3:1:LowDepth:0:. 0/0:0:18:LowGQ:0:. 0/0:7:16:PASS:0:. ./.:0:4:LowGQ:0:.
Now, I am using custom codes below to use Hail:
mt = hl.read_matrix_table(f'{i_dir}/ica_joint_msvcf_merged.mt')
mt = mt.annotate_entries(DP=hl.sum(mt.AD),
AD=hl.case()
.when(mt.GT.is_hom_ref(), hl.array([mt.AD[0], 0]))
.when(hl.is_missing(mt.GT) & hl.is_missing(mt.AD), hl.missing(hl.tarray(hl.tint32)))
.when(hl.is_missing(mt.GT) & ~hl.is_missing(mt.AD), hl.array([mt.AD[0], 0])))
mt.write(i_dir + 'Inputs/' + project + '_beforeQC_' + date +'.mt', overwrite=True)
So, I hope Hail supports DRAGEN msVCF.
Could you please consider this matter?
I’m attaching some a related link, so please check it together.
Please consider it positively.
Thank you.
Lee.