The HAIL “join” function provides an inner join for two variant datasets. I would like to ask for a new feature that supports outer join so that all variant sites within both datasets reported in the output.
Also, there are situations where merging a variant records is difficult, especially in multiallelic sites. Base on my observation HAIL ignores these situations and does not include such variant sites in the output. I would like to ask for a feature that correctly joins variant in that situation. Below I provide several examples showing different regions of FILE_1 and FILE_2 as well as HAIL join output file. There are more examples in below link.
https://drive.google.com/file/d/1jqZAnC4yWWzvOPwyjbr0Z7usfRotlwX3/view?usp=sharing
====================================================
Look for possition 21:9418008
FILE_1 region=21:9417998-9418018
21 9418008 . A T . . . GT
21 9418012 . T G . . . GT
21 9418016 rs4087029 T G . . . GT
FILE_2 region=21:9417998-9418018
21 9418008 . A AT . . . GT
HAIL join output region=21:9417998-9418018
====================================================
Look for possition 21:14385606
FILE_1 region=21:14385596-14385616
21 14385606 . C CTT,CTTT,A . . . GT
21 14385609 . TA T . . . GT
21 14385610 rs139914949 AT A,TT,*,ATT . . . GT
FILE_2 region=21:14385596-14385616
21 14385606 . C CTT,CTTT . . . GT
21 14385610 rs139914949 AT A,ATT,TT . . . GT
HAIL join output region=21:14385596-14385616
====================================================
Look for possition 21:14387043
FILE_1 region=21:14387033-14387053
21 14387041 . C A . . . GT
21 14387043 . GA G,GAA . . . GT
FILE_2 region=21:14387033-14387053
21 14387043 . GA AA,G . . . GT
HAIL join output region=21:14387033-14387053
====================================================
Look for possition 21:14391555
FILE_1 region=21:14391545-14391565
21 14391555 . AG A,GG . . . GT
21 14391556 rs115464252 G A,* . . . GT
FILE_2 region=21:14391545-14391565
21 14391555 . AG A . . . GT
HAIL join output region=21:14391545-14391565
====================================================
Look for possition 21:14392484
FILE_1 region=21:14392474-14392494
21 14392480 . A G . . . GT
21 14392484 . A AC,G . . . GT
21 14392485 . A AC,C . . . GT
FILE_2 region=21:14392474-14392494
21 14392484 . A AC . . . GT
21 14392485 . A AC,C . . . GT
HAIL join output region=21:14392474-14392494
21 14392485 . A AC,C -10 . . GT:AD:DP:GQ:PL
====================================================
Look for possition 21:14396727
FILE_1 region=21:14396717-14396737
21 14396727 . T C . . . GT
FILE_2 region=21:14396717-14396737
21 14396727 rs373212424 TG T . . . GT
HAIL join output region=21:14396717-14396737
====================================================