Added a method for computing genotype concordance between datasets. Read about it here: https://hail.is/hail/hail.VariantDataset.html#hail.VariantDataset.concordance
Regarding the Hail concordance command - if providing a sample truthset, and check genotype concordance against a sample replicate (knowing they have some discordance - the files were handmade to test the Hail concordance command), can you explain why the results are consistently 100%?
vdsNA24143TestTruthset = hc.read(‘/illumina/runs/rbdata/binfo/BioFX_pipeline/data/test/testSplitTruthset.vds’)
vdsNA24143Test97 = hc.read(‘/illumina/runs/rbdata/binfo/BioFX_pipeline/data/test/splitTest97.vds’)
vdsNA24143Test99 = hc.read(‘/illumina/runs/rbdata/binfo/BioFX_pipeline/data/test/splitTest99.vds’)
summary, samples, variants = vdsNA24143Test97.concordance(vdsNA24143TestTruthset)
2018-01-24 08:20:09 Hail: INFO: Found 1 overlapping samples
Left: 1 total samples
Right: 1 total samples
2018-01-24 08:20:10 Hail: INFO: Summary of inner join concordance:
Total observations: 100
Total concordant observations: 100
Total concordance: 100.00%
Thank you!
can you print the summary
list? The INFO message is just the inner join concordance, so maybe the discordance is in the outer join.
summary
[[0L, 0L, 0L, 0L, 0L], [0L, 100L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L]]
Example of vcf before conversion to vdsNA24143Test97 minus leading headers:
#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA24143 |
---|---|---|---|---|---|---|---|---|---|
1 | 58814 | GSA-rs114420996 | G | A | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 565508 | GSA-rs9283150 | G | A | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 567092 | GSA-rs9326622 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 726912 | GSA-1:726912 | A | G | . | PASS | . | GT:GQ | 0/0:0.271291 |
1 | 727841 | GSA-rs116587930 | G | A | . | PASS | . | GT:GQ | 0/0:0.614155 |
1 | 752721 | rs3131972 | A | G | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 756268 | rs12567639 | G | A | . | PASS | . | GT:GQ | 1/1:0.314226 |
1 | 759036 | GSA-rs114525117 | G | A | . | PASS | . | GT:GQ | 0/0:0.639129 |
1 | 794332 | rs12127425 | G | A | . | PASS | . | GT:GQ | 0/0:0.337806 |
1 | 801536 | GSA-rs79373928 | T | G | . | PASS | . | GT:GQ | 0/0:0.83396 |
1 | 807512 | rs10751454 | A | G | . | PASS | . | GT:GQ | 1/1:0.519144 |
1 | 815421 | GSA-rs72888853 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 830181 | rs28444699 | A | G | . | PASS | . | GT:GQ | 0/0:0.361144 |
1 | 830731 | GSA-1:830731 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 834830 | GSA-rs116452738 | G | A | . | PASS | . | GT:GQ | 0/0:0.865604 |
1 | 835092 | GSA-rs72631887 | T | G | . | PASS | . | GT:GQ | 0/0:0.787375 |
1 | 838555 | rs4970383 | C | A | . | PASS | . | GT:GQ | 0/0:0.7561 |
1 | 838665 | rs28678693 | T | C | . | PASS | . | GT:GQ | 0/0:0.807575 |
1 | 840753 | rs4970382 | T | C | . | PASS | . | GT:GQ | 0/1:0.760915 |
1 | 846808 | GSA-rs4475691 | C | T | . | PASS | . | GT:GQ | 0/0:0.940848 |
1 | 851390 | GSA-rs72631889 | G | T | . | PASS | . | GT:GQ | 0/1:0.890878 |
1 | 854250 | rs7537756 | A | G | . | PASS | . | GT:GQ | 0/0:0.8313 |
1 | 861808 | rs13302982 | A | G | . | PASS | . | GT:GQ | 1/1:0.904124 |
1 | 863130 | GSA-rs376747791 | A | G | . | PASS | . | GT:GQ | 0/0:0.622487 |
1 | 866893 | rs2880024 | T | C | . | PASS | . | GT:GQ | 0/1:0.814281 |
1 | 868404 | rs13302914 | C | T | . | PASS | . | GT:GQ | 1/1:0.839897 |
1 | 872952 | rs76723341 | C | T | . | PASS | . | GT:GQ | 0/0:0.803325 |
1 | 878331 | GSA-rs148327885 | C | T | . | PASS | . | GT:GQ | 0/0:0.422401 |
1 | 879911 | GSA-rs143853699 | G | A | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 881627 | rs2272757 | G | A | . | PASS | . | GT:GQ | 0/1:0.841812 |
1 | 884767 | GSA-rs67274836 | G | A | . | PASS | . | GT:GQ | 0/1:0.763616 |
1 | 888659 | rs3748597 | T | C | . | PASS | . | GT:GQ | 1/1:0.88395 |
1 | 889238 | GSA-rs3828049 | G | A | . | PASS | . | GT:GQ | 0/0:0.73819 |
1 | 891277 | GSA-rs77608078 | C | T | . | PASS | . | GT:GQ | 0/0:0.45426 |
1 | 894573 | exm2264981 | G | A | . | PASS | . | GT:GQ | 0/1:0.884253 |
1 | 897564 | rs13303229 | T | C | . | PASS | . | GT:GQ | 1/1:0.785132 |
1 | 900730 | rs3935066 | G | A | . | PASS | . | GT:GQ | 0/1:0.819186 |
1 | 903321 | rs6669800 | G | A | . | PASS | . | GT:GQ | 1/1:0.724197 |
1 | 904752 | rs35241590 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 910473 | rs28561399 | G | A | . | PASS | . | GT:GQ | 0/0:0.692576 |
1 | 911101 | GSA-rs3748588 | C | T | . | PASS | . | GT:GQ | 0/0:0.775759 |
1 | 914749 | GSA-rs186101910 | C | T | . | PASS | . | GT:GQ | 0/0:0.829802 |
1 | 917640 | rs41285816 | G | A | . | PASS | . | GT:GQ | 0/0:0.797152 |
1 | 918573 | rs2341354 | A | G | . | PASS | . | GT:GQ | 0/1:0.797628 |
1 | 919419 | rs6605059 | T | C | . | PASS | . | GT:GQ | 0/1:0.833271 |
1 | 919501 | rs4970414 | G | T | . | PASS | . | GT:GQ | 0/1:0.863209 |
1 | 919855 | rs116781904 | G | A | . | PASS | . | GT:GQ | 0/0:0.920867 |
1 | 919927 | GSA-rs61770779 | G | A | . | PASS | . | GT:GQ | 0/0:0.843464 |
1 | 949472 | rs202075563 | G | A | . | PASS | . | GT:GQ | 0/0:0.862104 |
1 | 949491 | rs148041041 | G | A | . | PASS | . | GT:GQ | 0/0:0.509424 |
1 | 957898 | rs2799064 | G | T | . | PASS | . | GT:GQ | 0/0:0.856532 |
1 | 959509 | rs28591569 | T | G | . | PASS | . | GT:GQ | 0/0:0.693086 |
1 | 974894 | rs3121578 | C | T | . | PASS | . | GT:GQ | 0/1:0.935316 |
1 | 978642 | rs199563268 | G | A | . | PASS | . | GT:GQ | 0/0:0.474509 |
1 | 978762 | rs138288952 | G | A | . | PASS | . | GT:GQ | 0/0:0.774468 |
1 | 978804 | rs144164397 | C | T | . | PASS | . | GT:GQ | 0/0:0.377943 |
1 | 978974 | rs79016973 | G | A | . | PASS | . | GT:GQ | 0/0:0.84322 |
1 | 979397 | rs143324306 | G | A | . | PASS | . | GT:GQ | 0/0:0.825384 |
1 | 979748 | rs113288277 | A | T | . | PASS | . | GT:GQ | 0/0:0.705457 |
1 | 980824 | seq-rs112039851 | G | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 980868 | rs146243145 | G | A | . | PASS | . | GT:GQ | 0/0:0.868118 |
1 | 981139 | rs200684031 | G | A | . | PASS | . | GT:GQ | 0/0:0.500205 |
1 | 981244 | rs202061838 | G | A | . | PASS | . | GT:GQ | 0/0:0.508203 |
1 | 982968 | rs149268246 | C | T | . | PASS | . | GT:GQ | 0/0:0.502549 |
1 | 983005 | rs149762107 | G | A | . | PASS | . | GT:GQ | 0/0:0.881114 |
1 | 983040 | rs148948883 | G | A | . | PASS | . | GT:GQ | 0/0:0.489731 |
1 | 983243 | rs142620337 | C | T | . | PASS | . | GT:GQ | 0/0:0.833537 |
1 | 984971 | GSA-rs111818381 | G | A | . | PASS | . | GT:GQ | 0/0:0.804288 |
1 | 985460 | rs2275811 | T | C | . | PASS | . | GT:GQ | 0/0:0.789715 |
1 | 985905 | rs143143061 | C | T | . | PASS | . | GT:GQ | 0/0:0.865967 |
1 | 986165 | rs145444272 | G | A | . | PASS | . | GT:GQ | 0/0:0.490224 |
1 | 986918 | rs72900459 | C | T | . | PASS | . | GT:GQ | 0/0:0.391159 |
1 | 986963 | rs145116277 | C | T | . | PASS | . | GT:GQ | 0/0:0.879131 |
1 | 987253 | GSA-rs113261977 | C | T | . | PASS | . | GT:GQ | 0/0:0.816305 |
1 | 988902 | GSA-rs74223856 | C | A | . | PASS | . | GT:GQ | 0/0:0.864488 |
1 | 990417 | rs2465136 | T | C | . | PASS | . | GT:GQ | 0/0:0.908515 |
1 | 998395 | rs7526076 | A | G | . | PASS | . | GT:GQ | 0/1:0.930196 |
1 | 1004331 | GSA-rs113592356 | C | T | . | PASS | . | GT:GQ | 0/0:0.761946 |
1 | 1018704 | rs9442372 | A | G | . | PASS | . | GT:GQ | 0/1:0.828984 |
1 | 1022223 | GSA-rs115723010 | G | A | . | PASS | . | GT:GQ | 0/0:0.555626 |
1 | 1022423 | GSA-rs114326054 | G | A | . | PASS | . | GT:GQ | 0/0:0.753926 |
1 | 1023114 | GSA-rs61766340 | G | A | . | PASS | . | GT:GQ | 0/0:0.7728 |
1 | 1023788 | rs12132100 | C | T | . | PASS | . | GT:GQ | 0/0:0.823209 |
1 | 1026428 | GSA-rs116334314 | G | A | . | PASS | . | GT:GQ | 0/0:0.890004 |
1 | 1026913 | GSA-rs115662838 | C | T | . | PASS | . | GT:GQ | 0/0:0.734707 |
1 | 1027888 | GSA-rs77334480 | C | T | . | PASS | . | GT:GQ | 0/0:0.862744 |
1 | 1030374 | rs12731175 | G | A | . | PASS | . | GT:GQ | 0/0:0.945677 |
1 | 1031540 | rs9651273 | A | G | . | PASS | . | GT:GQ | 0/0:0.778961 |
1 | 1040026 | rs6671356 | T | C | . | PASS | . | GT:GQ | 0/0:0.878835 |
1 | 1045331 | GSA-rs147606383 | G | A | . | PASS | . | GT:GQ | 0/0:0.697128 |
1 | 1045606 | rs12080505 | A | C | . | PASS | . | GT:GQ | 0/0:0.332331 |
1 | 1054091 | GSA-rs61766344 | C | T | . | PASS | . | GT:GQ | 0/0:0.734926 |
1 | 1062638 | rs9442373 | C | A | . | PASS | . | GT:GQ | 0/0:0.817336 |
1 | 1065296 | rs4072537 | T | C | . | PASS | . | GT:GQ | 0/1:0.798382 |
1 | 1065726 | GSA-rs11260598 | T | C | . | PASS | . | GT:GQ | 0/0:0.907904 |
1 | 1068883 | rs61766346 | G | A | . | PASS | . | GT:GQ | 0/0:0.922105 |
1 | 1070467 | rs139475585 | G | A | . | PASS | . | GT:GQ | 0/0:0.918811 |
1 | 1072181 | rs141230226 | C | T | . | PASS | . | GT:GQ | 0/0:0.876326 |
1 | 1079198 | rs11260603 | T | C | . | PASS | . | GT:GQ | 0/0:0.794936 |
1 | 1079261 | GSA-rs116661896 | G | A | . | PASS | . | GT:GQ | 0/1:0.755679 |
Example of vcf before conversion to test.truthset.vcf minus leading headers:
#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA24143 |
---|---|---|---|---|---|---|---|---|---|
1 | 58814 | GSA-rs114420996 | G | A | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 565508 | GSA-rs9283150 | G | A | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 567092 | GSA-rs9326622 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 726912 | GSA-1:726912 | A | G | . | PASS | . | GT:GQ | 0/0:0.271291 |
1 | 727841 | GSA-rs116587930 | G | A | . | PASS | . | GT:GQ | 0/0:0.614155 |
1 | 752721 | rs3131972 | A | G | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 756268 | rs12567639 | G | A | . | PASS | . | GT:GQ | 1/1:0.314226 |
1 | 759036 | GSA-rs114525117 | G | A | . | PASS | . | GT:GQ | 0/0:0.639129 |
1 | 794332 | rs12127425 | G | A | . | PASS | . | GT:GQ | 0/0:0.337806 |
1 | 801536 | GSA-rs79373928 | T | G | . | PASS | . | GT:GQ | 0/0:0.83396 |
1 | 807512 | rs10751454 | A | G | . | PASS | . | GT:GQ | 1/1:0.519144 |
1 | 815421 | GSA-rs72888853 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 830181 | rs28444699 | A | G | . | PASS | . | GT:GQ | 0/0:0.361144 |
1 | 830731 | GSA-1:830731 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 834830 | GSA-rs116452738 | G | A | . | PASS | . | GT:GQ | 0/0:0.865604 |
1 | 835092 | GSA-rs72631887 | T | G | . | PASS | . | GT:GQ | 0/0:0.787375 |
1 | 838555 | rs4970383 | C | A | . | PASS | . | GT:GQ | 0/0:0.7561 |
1 | 838665 | rs28678693 | T | C | . | PASS | . | GT:GQ | 0/0:0.807575 |
1 | 840753 | rs4970382 | T | C | . | PASS | . | GT:GQ | 0/1:0.760915 |
1 | 846808 | GSA-rs4475691 | C | T | . | PASS | . | GT:GQ | 0/0:0.940848 |
1 | 851390 | GSA-rs72631889 | G | T | . | PASS | . | GT:GQ | 0/1:0.890878 |
1 | 854250 | rs7537756 | A | G | . | PASS | . | GT:GQ | 0/0:0.8313 |
1 | 861808 | rs13302982 | A | G | . | PASS | . | GT:GQ | 1/1:0.904124 |
1 | 863130 | GSA-rs376747791 | A | G | . | PASS | . | GT:GQ | 0/0:0.622487 |
1 | 866893 | rs2880024 | T | C | . | PASS | . | GT:GQ | 0/1:0.814281 |
1 | 868404 | rs13302914 | C | T | . | PASS | . | GT:GQ | 1/1:0.839897 |
1 | 872952 | rs76723341 | C | T | . | PASS | . | GT:GQ | 0/0:0.803325 |
1 | 878331 | GSA-rs148327885 | C | T | . | PASS | . | GT:GQ | 0/0:0.422401 |
1 | 879911 | GSA-rs143853699 | G | A | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 881627 | rs2272757 | G | A | . | PASS | . | GT:GQ | 0/1:0.841812 |
1 | 884767 | GSA-rs67274836 | G | A | . | PASS | . | GT:GQ | 0/1:0.763616 |
1 | 888659 | rs3748597 | T | C | . | PASS | . | GT:GQ | 1/1:0.88395 |
1 | 889238 | GSA-rs3828049 | G | A | . | PASS | . | GT:GQ | 0/0:0.73819 |
1 | 891277 | GSA-rs77608078 | C | T | . | PASS | . | GT:GQ | 0/0:0.45426 |
1 | 894573 | exm2264981 | G | A | . | PASS | . | GT:GQ | 0/1:0.884253 |
1 | 897564 | rs13303229 | T | C | . | PASS | . | GT:GQ | 1/1:0.785132 |
1 | 900730 | rs3935066 | G | A | . | PASS | . | GT:GQ | 0/1:0.819186 |
1 | 903321 | rs6669800 | G | A | . | PASS | . | GT:GQ | 1/1:0.724197 |
1 | 904752 | rs35241590 | T | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 910473 | rs28561399 | G | A | . | PASS | . | GT:GQ | 0/0:0.692576 |
1 | 911101 | GSA-rs3748588 | C | T | . | PASS | . | GT:GQ | 0/0:0.775759 |
1 | 914749 | GSA-rs186101910 | C | T | . | PASS | . | GT:GQ | 0/0:0.829802 |
1 | 917640 | rs41285816 | G | A | . | PASS | . | GT:GQ | 0/0:0.797152 |
1 | 918573 | rs2341354 | A | G | . | PASS | . | GT:GQ | 0/1:0.797628 |
1 | 919419 | rs6605059 | T | C | . | PASS | . | GT:GQ | 0/1:0.833271 |
1 | 919501 | rs4970414 | G | T | . | PASS | . | GT:GQ | 0/1:0.863209 |
1 | 919855 | rs116781904 | G | A | . | PASS | . | GT:GQ | 0/0:0.920867 |
1 | 919927 | GSA-rs61770779 | G | A | . | PASS | . | GT:GQ | 0/0:0.843464 |
1 | 949472 | rs202075563 | G | A | . | PASS | . | GT:GQ | 0/0:0.862104 |
1 | 949491 | rs148041041 | G | A | . | PASS | . | GT:GQ | 0/0:0.509424 |
1 | 957898 | rs2799064 | G | T | . | PASS | . | GT:GQ | 0/0:0.856532 |
1 | 959509 | rs28591569 | T | G | . | PASS | . | GT:GQ | 0/0:0.693086 |
1 | 974894 | rs3121578 | C | T | . | PASS | . | GT:GQ | 0/1:0.935316 |
1 | 978642 | rs199563268 | G | A | . | PASS | . | GT:GQ | 0/0:0.474509 |
1 | 978762 | rs138288952 | G | A | . | PASS | . | GT:GQ | 0/0:0.774468 |
1 | 978804 | rs144164397 | C | T | . | PASS | . | GT:GQ | 0/0:0.377943 |
1 | 978974 | rs79016973 | G | A | . | PASS | . | GT:GQ | 0/0:0.84322 |
1 | 979397 | rs143324306 | G | A | . | PASS | . | GT:GQ | 0/0:0.825384 |
1 | 979748 | rs113288277 | A | T | . | PASS | . | GT:GQ | 0/0:0.705457 |
1 | 980824 | seq-rs112039851 | G | C | . | PASS | . | GT:GQ | ./.:0.0 |
1 | 980868 | rs146243145 | G | A | . | PASS | . | GT:GQ | 0/0:0.868118 |
1 | 981139 | rs200684031 | G | A | . | PASS | . | GT:GQ | 0/0:0.500205 |
1 | 981244 | rs202061838 | G | A | . | PASS | . | GT:GQ | 0/0:0.508203 |
1 | 982968 | rs149268246 | C | T | . | PASS | . | GT:GQ | 0/0:0.502549 |
1 | 983005 | rs149762107 | G | A | . | PASS | . | GT:GQ | 0/0:0.881114 |
1 | 983040 | rs148948883 | G | A | . | PASS | . | GT:GQ | 0/0:0.489731 |
1 | 983243 | rs142620337 | C | T | . | PASS | . | GT:GQ | 0/0:0.833537 |
1 | 984971 | GSA-rs111818381 | G | A | . | PASS | . | GT:GQ | 0/0:0.804288 |
1 | 985460 | rs2275811 | T | C | . | PASS | . | GT:GQ | 0/0:0.789715 |
1 | 985905 | rs143143061 | C | T | . | PASS | . | GT:GQ | 0/0:0.865967 |
1 | 986165 | rs145444272 | G | A | . | PASS | . | GT:GQ | 0/0:0.490224 |
1 | 986918 | rs72900459 | C | T | . | PASS | . | GT:GQ | 0/0:0.391159 |
1 | 986963 | rs145116277 | C | T | . | PASS | . | GT:GQ | 0/0:0.879131 |
1 | 987253 | GSA-rs113261977 | C | T | . | PASS | . | GT:GQ | 0/0:0.816305 |
1 | 988902 | GSA-rs74223856 | C | A | . | PASS | . | GT:GQ | 0/0:0.864488 |
1 | 990417 | rs2465136 | T | C | . | PASS | . | GT:GQ | 0/0:0.908515 |
1 | 998395 | rs7526076 | A | G | . | PASS | . | GT:GQ | 0/1:0.930196 |
1 | 1004331 | GSA-rs113592356 | C | T | . | PASS | . | GT:GQ | 0/0:0.761946 |
1 | 1018704 | rs9442372 | A | G | . | PASS | . | GT:GQ | 0/1:0.828984 |
1 | 1022223 | GSA-rs115723010 | G | A | . | PASS | . | GT:GQ | 0/0:0.555626 |
1 | 1022423 | GSA-rs114326054 | G | A | . | PASS | . | GT:GQ | 0/0:0.753926 |
1 | 1023114 | GSA-rs61766340 | G | A | . | PASS | . | GT:GQ | 0/0:0.7728 |
1 | 1023788 | rs12132100 | C | T | . | PASS | . | GT:GQ | 0/0:0.823209 |
1 | 1026428 | GSA-rs116334314 | G | A | . | PASS | . | GT:GQ | 0/0:0.890004 |
1 | 1026913 | GSA-rs115662838 | C | T | . | PASS | . | GT:GQ | 0/0:0.734707 |
1 | 1027888 | GSA-rs77334480 | C | T | . | PASS | . | GT:GQ | 0/0:0.862744 |
1 | 1030374 | rs12731175 | G | A | . | PASS | . | GT:GQ | 0/0:0.945677 |
1 | 1031540 | rs9651273 | A | G | . | PASS | . | GT:GQ | 0/1:0.778961 |
1 | 1040026 | rs6671356 | T | C | . | PASS | . | GT:GQ | 0/0:0.878835 |
1 | 1045331 | GSA-rs147606383 | G | A | . | PASS | . | GT:GQ | 0/0:0.697128 |
1 | 1045606 | rs12080505 | A | C | . | PASS | . | GT:GQ | 0/0:0.332331 |
1 | 1054091 | GSA-rs61766344 | C | T | . | PASS | . | GT:GQ | 0/0:0.734926 |
1 | 1062638 | rs9442373 | C | A | . | PASS | . | GT:GQ | 0/0:0.817336 |
1 | 1065296 | rs4072537 | T | C | . | PASS | . | GT:GQ | 1/1:0.798382 |
1 | 1065726 | GSA-rs11260598 | T | C | . | PASS | . | GT:GQ | 0/0:0.907904 |
1 | 1068883 | rs61766346 | G | A | . | PASS | . | GT:GQ | 0/0:0.922105 |
1 | 1070467 | rs139475585 | G | A | . | PASS | . | GT:GQ | 0/0:0.918811 |
1 | 1072181 | rs141230226 | C | T | . | PASS | . | GT:GQ | 0/0:0.876326 |
1 | 1079198 | rs11260603 | T | C | . | PASS | . | GT:GQ | 0/0:0.794936 |
1 | 1079261 | GSA-rs116661896 | G | A | . | PASS | . | GT:GQ | 0/0:0.755679 |
The summary printout indicates that every call is a no call in your dataset.
Try this:
vdsNA24143Test97.genotypes_table().select(['v', 's', 'g']).show()
I believe this is happening because your GQ field is invalid according to the VCF 4.2 spec: https://samtools.github.io/hts-specs/VCFv4.2.pdf
GQ should be an integer, but your values are floating-point. Hail 0.1 is pretty much hard-coded for GATK-like VCFs, which has caused problems for other sources of data. We automatically filter certain invalid data arrangements, and this must be one of them.
We’ll be releasing a beta version of Hail 0.2 in ~4-6 weeks. In Hail 0.2, the genotype schema is totally flexible and this should load fine.
You’re correct - they are all no calls:
vdsNA24143Test97.genotypes_table().select([‘v’, ‘s’, ‘g’]).show()
±-------------±--------±---------------+
| v | s | g |
±-------------±--------±---------------+
| Variant | String | Genotype |
±-------------±--------±---------------+
| 1:58814:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:565508:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:567092:T:C | NA24143 | ./.:.:.:.:PL=. |
| 1:726912:A:G | NA24143 | ./.:.:.:.:PL=. |
| 1:727841:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:752721:A:G | NA24143 | ./.:.:.:.:PL=. |
| 1:756268:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:759036:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:794332:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:801536:T:G | NA24143 | ./.:.:.:.:PL=. |
±-------------±--------±---------------+
showing top 10 rows
Thank you for the insight - looking forward to Hail 0.2 then
We’ll be posting on this forum when the new version is ready for community use. It’s a lot better than 0.1, so I’m excited too!
Hi tpoterba & Hail Team - Is there an updated timeframe for the Hail 0.2 release?
I think we’re nervous about heavily advertising right now, because we have a lot to do before the official stable 0.2 release (months away), but the 0.2 beta version is definitely ready for use! Most Broad users have moved over to it now.
The docs are here: https://www.hail.is/docs/devel/
The 0.2 tutorial is a good place to start, as well as the overview page (thanks, Jackie!). Let us know here or gitter if you have questions!