Concordance command added

Added a method for computing genotype concordance between datasets. Read about it here: https://hail.is/hail/hail.VariantDataset.html#hail.VariantDataset.concordance

Regarding the Hail concordance command - if providing a sample truthset, and check genotype concordance against a sample replicate (knowing they have some discordance - the files were handmade to test the Hail concordance command), can you explain why the results are consistently 100%?

vdsNA24143TestTruthset = hc.read(‘/illumina/runs/rbdata/binfo/BioFX_pipeline/data/test/testSplitTruthset.vds’)
vdsNA24143Test97 = hc.read(‘/illumina/runs/rbdata/binfo/BioFX_pipeline/data/test/splitTest97.vds’)
vdsNA24143Test99 = hc.read(‘/illumina/runs/rbdata/binfo/BioFX_pipeline/data/test/splitTest99.vds’)
summary, samples, variants = vdsNA24143Test97.concordance(vdsNA24143TestTruthset)
2018-01-24 08:20:09 Hail: INFO: Found 1 overlapping samples
Left: 1 total samples
Right: 1 total samples
2018-01-24 08:20:10 Hail: INFO: Summary of inner join concordance:
Total observations: 100
Total concordant observations: 100
Total concordance: 100.00%

Thank you!

can you print the summary list? The INFO message is just the inner join concordance, so maybe the discordance is in the outer join.

summary
[[0L, 0L, 0L, 0L, 0L], [0L, 100L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L], [0L, 0L, 0L, 0L, 0L]]

Example of vcf before conversion to vdsNA24143Test97 minus leading headers:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA24143
1 58814 GSA-rs114420996 G A . PASS . GT:GQ ./.:0.0
1 565508 GSA-rs9283150 G A . PASS . GT:GQ ./.:0.0
1 567092 GSA-rs9326622 T C . PASS . GT:GQ ./.:0.0
1 726912 GSA-1:726912 A G . PASS . GT:GQ 0/0:0.271291
1 727841 GSA-rs116587930 G A . PASS . GT:GQ 0/0:0.614155
1 752721 rs3131972 A G . PASS . GT:GQ ./.:0.0
1 756268 rs12567639 G A . PASS . GT:GQ 1/1:0.314226
1 759036 GSA-rs114525117 G A . PASS . GT:GQ 0/0:0.639129
1 794332 rs12127425 G A . PASS . GT:GQ 0/0:0.337806
1 801536 GSA-rs79373928 T G . PASS . GT:GQ 0/0:0.83396
1 807512 rs10751454 A G . PASS . GT:GQ 1/1:0.519144
1 815421 GSA-rs72888853 T C . PASS . GT:GQ ./.:0.0
1 830181 rs28444699 A G . PASS . GT:GQ 0/0:0.361144
1 830731 GSA-1:830731 T C . PASS . GT:GQ ./.:0.0
1 834830 GSA-rs116452738 G A . PASS . GT:GQ 0/0:0.865604
1 835092 GSA-rs72631887 T G . PASS . GT:GQ 0/0:0.787375
1 838555 rs4970383 C A . PASS . GT:GQ 0/0:0.7561
1 838665 rs28678693 T C . PASS . GT:GQ 0/0:0.807575
1 840753 rs4970382 T C . PASS . GT:GQ 0/1:0.760915
1 846808 GSA-rs4475691 C T . PASS . GT:GQ 0/0:0.940848
1 851390 GSA-rs72631889 G T . PASS . GT:GQ 0/1:0.890878
1 854250 rs7537756 A G . PASS . GT:GQ 0/0:0.8313
1 861808 rs13302982 A G . PASS . GT:GQ 1/1:0.904124
1 863130 GSA-rs376747791 A G . PASS . GT:GQ 0/0:0.622487
1 866893 rs2880024 T C . PASS . GT:GQ 0/1:0.814281
1 868404 rs13302914 C T . PASS . GT:GQ 1/1:0.839897
1 872952 rs76723341 C T . PASS . GT:GQ 0/0:0.803325
1 878331 GSA-rs148327885 C T . PASS . GT:GQ 0/0:0.422401
1 879911 GSA-rs143853699 G A . PASS . GT:GQ ./.:0.0
1 881627 rs2272757 G A . PASS . GT:GQ 0/1:0.841812
1 884767 GSA-rs67274836 G A . PASS . GT:GQ 0/1:0.763616
1 888659 rs3748597 T C . PASS . GT:GQ 1/1:0.88395
1 889238 GSA-rs3828049 G A . PASS . GT:GQ 0/0:0.73819
1 891277 GSA-rs77608078 C T . PASS . GT:GQ 0/0:0.45426
1 894573 exm2264981 G A . PASS . GT:GQ 0/1:0.884253
1 897564 rs13303229 T C . PASS . GT:GQ 1/1:0.785132
1 900730 rs3935066 G A . PASS . GT:GQ 0/1:0.819186
1 903321 rs6669800 G A . PASS . GT:GQ 1/1:0.724197
1 904752 rs35241590 T C . PASS . GT:GQ ./.:0.0
1 910473 rs28561399 G A . PASS . GT:GQ 0/0:0.692576
1 911101 GSA-rs3748588 C T . PASS . GT:GQ 0/0:0.775759
1 914749 GSA-rs186101910 C T . PASS . GT:GQ 0/0:0.829802
1 917640 rs41285816 G A . PASS . GT:GQ 0/0:0.797152
1 918573 rs2341354 A G . PASS . GT:GQ 0/1:0.797628
1 919419 rs6605059 T C . PASS . GT:GQ 0/1:0.833271
1 919501 rs4970414 G T . PASS . GT:GQ 0/1:0.863209
1 919855 rs116781904 G A . PASS . GT:GQ 0/0:0.920867
1 919927 GSA-rs61770779 G A . PASS . GT:GQ 0/0:0.843464
1 949472 rs202075563 G A . PASS . GT:GQ 0/0:0.862104
1 949491 rs148041041 G A . PASS . GT:GQ 0/0:0.509424
1 957898 rs2799064 G T . PASS . GT:GQ 0/0:0.856532
1 959509 rs28591569 T G . PASS . GT:GQ 0/0:0.693086
1 974894 rs3121578 C T . PASS . GT:GQ 0/1:0.935316
1 978642 rs199563268 G A . PASS . GT:GQ 0/0:0.474509
1 978762 rs138288952 G A . PASS . GT:GQ 0/0:0.774468
1 978804 rs144164397 C T . PASS . GT:GQ 0/0:0.377943
1 978974 rs79016973 G A . PASS . GT:GQ 0/0:0.84322
1 979397 rs143324306 G A . PASS . GT:GQ 0/0:0.825384
1 979748 rs113288277 A T . PASS . GT:GQ 0/0:0.705457
1 980824 seq-rs112039851 G C . PASS . GT:GQ ./.:0.0
1 980868 rs146243145 G A . PASS . GT:GQ 0/0:0.868118
1 981139 rs200684031 G A . PASS . GT:GQ 0/0:0.500205
1 981244 rs202061838 G A . PASS . GT:GQ 0/0:0.508203
1 982968 rs149268246 C T . PASS . GT:GQ 0/0:0.502549
1 983005 rs149762107 G A . PASS . GT:GQ 0/0:0.881114
1 983040 rs148948883 G A . PASS . GT:GQ 0/0:0.489731
1 983243 rs142620337 C T . PASS . GT:GQ 0/0:0.833537
1 984971 GSA-rs111818381 G A . PASS . GT:GQ 0/0:0.804288
1 985460 rs2275811 T C . PASS . GT:GQ 0/0:0.789715
1 985905 rs143143061 C T . PASS . GT:GQ 0/0:0.865967
1 986165 rs145444272 G A . PASS . GT:GQ 0/0:0.490224
1 986918 rs72900459 C T . PASS . GT:GQ 0/0:0.391159
1 986963 rs145116277 C T . PASS . GT:GQ 0/0:0.879131
1 987253 GSA-rs113261977 C T . PASS . GT:GQ 0/0:0.816305
1 988902 GSA-rs74223856 C A . PASS . GT:GQ 0/0:0.864488
1 990417 rs2465136 T C . PASS . GT:GQ 0/0:0.908515
1 998395 rs7526076 A G . PASS . GT:GQ 0/1:0.930196
1 1004331 GSA-rs113592356 C T . PASS . GT:GQ 0/0:0.761946
1 1018704 rs9442372 A G . PASS . GT:GQ 0/1:0.828984
1 1022223 GSA-rs115723010 G A . PASS . GT:GQ 0/0:0.555626
1 1022423 GSA-rs114326054 G A . PASS . GT:GQ 0/0:0.753926
1 1023114 GSA-rs61766340 G A . PASS . GT:GQ 0/0:0.7728
1 1023788 rs12132100 C T . PASS . GT:GQ 0/0:0.823209
1 1026428 GSA-rs116334314 G A . PASS . GT:GQ 0/0:0.890004
1 1026913 GSA-rs115662838 C T . PASS . GT:GQ 0/0:0.734707
1 1027888 GSA-rs77334480 C T . PASS . GT:GQ 0/0:0.862744
1 1030374 rs12731175 G A . PASS . GT:GQ 0/0:0.945677
1 1031540 rs9651273 A G . PASS . GT:GQ 0/0:0.778961
1 1040026 rs6671356 T C . PASS . GT:GQ 0/0:0.878835
1 1045331 GSA-rs147606383 G A . PASS . GT:GQ 0/0:0.697128
1 1045606 rs12080505 A C . PASS . GT:GQ 0/0:0.332331
1 1054091 GSA-rs61766344 C T . PASS . GT:GQ 0/0:0.734926
1 1062638 rs9442373 C A . PASS . GT:GQ 0/0:0.817336
1 1065296 rs4072537 T C . PASS . GT:GQ 0/1:0.798382
1 1065726 GSA-rs11260598 T C . PASS . GT:GQ 0/0:0.907904
1 1068883 rs61766346 G A . PASS . GT:GQ 0/0:0.922105
1 1070467 rs139475585 G A . PASS . GT:GQ 0/0:0.918811
1 1072181 rs141230226 C T . PASS . GT:GQ 0/0:0.876326
1 1079198 rs11260603 T C . PASS . GT:GQ 0/0:0.794936
1 1079261 GSA-rs116661896 G A . PASS . GT:GQ 0/1:0.755679

Example of vcf before conversion to test.truthset.vcf minus leading headers:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA24143
1 58814 GSA-rs114420996 G A . PASS . GT:GQ ./.:0.0
1 565508 GSA-rs9283150 G A . PASS . GT:GQ ./.:0.0
1 567092 GSA-rs9326622 T C . PASS . GT:GQ ./.:0.0
1 726912 GSA-1:726912 A G . PASS . GT:GQ 0/0:0.271291
1 727841 GSA-rs116587930 G A . PASS . GT:GQ 0/0:0.614155
1 752721 rs3131972 A G . PASS . GT:GQ ./.:0.0
1 756268 rs12567639 G A . PASS . GT:GQ 1/1:0.314226
1 759036 GSA-rs114525117 G A . PASS . GT:GQ 0/0:0.639129
1 794332 rs12127425 G A . PASS . GT:GQ 0/0:0.337806
1 801536 GSA-rs79373928 T G . PASS . GT:GQ 0/0:0.83396
1 807512 rs10751454 A G . PASS . GT:GQ 1/1:0.519144
1 815421 GSA-rs72888853 T C . PASS . GT:GQ ./.:0.0
1 830181 rs28444699 A G . PASS . GT:GQ 0/0:0.361144
1 830731 GSA-1:830731 T C . PASS . GT:GQ ./.:0.0
1 834830 GSA-rs116452738 G A . PASS . GT:GQ 0/0:0.865604
1 835092 GSA-rs72631887 T G . PASS . GT:GQ 0/0:0.787375
1 838555 rs4970383 C A . PASS . GT:GQ 0/0:0.7561
1 838665 rs28678693 T C . PASS . GT:GQ 0/0:0.807575
1 840753 rs4970382 T C . PASS . GT:GQ 0/1:0.760915
1 846808 GSA-rs4475691 C T . PASS . GT:GQ 0/0:0.940848
1 851390 GSA-rs72631889 G T . PASS . GT:GQ 0/1:0.890878
1 854250 rs7537756 A G . PASS . GT:GQ 0/0:0.8313
1 861808 rs13302982 A G . PASS . GT:GQ 1/1:0.904124
1 863130 GSA-rs376747791 A G . PASS . GT:GQ 0/0:0.622487
1 866893 rs2880024 T C . PASS . GT:GQ 0/1:0.814281
1 868404 rs13302914 C T . PASS . GT:GQ 1/1:0.839897
1 872952 rs76723341 C T . PASS . GT:GQ 0/0:0.803325
1 878331 GSA-rs148327885 C T . PASS . GT:GQ 0/0:0.422401
1 879911 GSA-rs143853699 G A . PASS . GT:GQ ./.:0.0
1 881627 rs2272757 G A . PASS . GT:GQ 0/1:0.841812
1 884767 GSA-rs67274836 G A . PASS . GT:GQ 0/1:0.763616
1 888659 rs3748597 T C . PASS . GT:GQ 1/1:0.88395
1 889238 GSA-rs3828049 G A . PASS . GT:GQ 0/0:0.73819
1 891277 GSA-rs77608078 C T . PASS . GT:GQ 0/0:0.45426
1 894573 exm2264981 G A . PASS . GT:GQ 0/1:0.884253
1 897564 rs13303229 T C . PASS . GT:GQ 1/1:0.785132
1 900730 rs3935066 G A . PASS . GT:GQ 0/1:0.819186
1 903321 rs6669800 G A . PASS . GT:GQ 1/1:0.724197
1 904752 rs35241590 T C . PASS . GT:GQ ./.:0.0
1 910473 rs28561399 G A . PASS . GT:GQ 0/0:0.692576
1 911101 GSA-rs3748588 C T . PASS . GT:GQ 0/0:0.775759
1 914749 GSA-rs186101910 C T . PASS . GT:GQ 0/0:0.829802
1 917640 rs41285816 G A . PASS . GT:GQ 0/0:0.797152
1 918573 rs2341354 A G . PASS . GT:GQ 0/1:0.797628
1 919419 rs6605059 T C . PASS . GT:GQ 0/1:0.833271
1 919501 rs4970414 G T . PASS . GT:GQ 0/1:0.863209
1 919855 rs116781904 G A . PASS . GT:GQ 0/0:0.920867
1 919927 GSA-rs61770779 G A . PASS . GT:GQ 0/0:0.843464
1 949472 rs202075563 G A . PASS . GT:GQ 0/0:0.862104
1 949491 rs148041041 G A . PASS . GT:GQ 0/0:0.509424
1 957898 rs2799064 G T . PASS . GT:GQ 0/0:0.856532
1 959509 rs28591569 T G . PASS . GT:GQ 0/0:0.693086
1 974894 rs3121578 C T . PASS . GT:GQ 0/1:0.935316
1 978642 rs199563268 G A . PASS . GT:GQ 0/0:0.474509
1 978762 rs138288952 G A . PASS . GT:GQ 0/0:0.774468
1 978804 rs144164397 C T . PASS . GT:GQ 0/0:0.377943
1 978974 rs79016973 G A . PASS . GT:GQ 0/0:0.84322
1 979397 rs143324306 G A . PASS . GT:GQ 0/0:0.825384
1 979748 rs113288277 A T . PASS . GT:GQ 0/0:0.705457
1 980824 seq-rs112039851 G C . PASS . GT:GQ ./.:0.0
1 980868 rs146243145 G A . PASS . GT:GQ 0/0:0.868118
1 981139 rs200684031 G A . PASS . GT:GQ 0/0:0.500205
1 981244 rs202061838 G A . PASS . GT:GQ 0/0:0.508203
1 982968 rs149268246 C T . PASS . GT:GQ 0/0:0.502549
1 983005 rs149762107 G A . PASS . GT:GQ 0/0:0.881114
1 983040 rs148948883 G A . PASS . GT:GQ 0/0:0.489731
1 983243 rs142620337 C T . PASS . GT:GQ 0/0:0.833537
1 984971 GSA-rs111818381 G A . PASS . GT:GQ 0/0:0.804288
1 985460 rs2275811 T C . PASS . GT:GQ 0/0:0.789715
1 985905 rs143143061 C T . PASS . GT:GQ 0/0:0.865967
1 986165 rs145444272 G A . PASS . GT:GQ 0/0:0.490224
1 986918 rs72900459 C T . PASS . GT:GQ 0/0:0.391159
1 986963 rs145116277 C T . PASS . GT:GQ 0/0:0.879131
1 987253 GSA-rs113261977 C T . PASS . GT:GQ 0/0:0.816305
1 988902 GSA-rs74223856 C A . PASS . GT:GQ 0/0:0.864488
1 990417 rs2465136 T C . PASS . GT:GQ 0/0:0.908515
1 998395 rs7526076 A G . PASS . GT:GQ 0/1:0.930196
1 1004331 GSA-rs113592356 C T . PASS . GT:GQ 0/0:0.761946
1 1018704 rs9442372 A G . PASS . GT:GQ 0/1:0.828984
1 1022223 GSA-rs115723010 G A . PASS . GT:GQ 0/0:0.555626
1 1022423 GSA-rs114326054 G A . PASS . GT:GQ 0/0:0.753926
1 1023114 GSA-rs61766340 G A . PASS . GT:GQ 0/0:0.7728
1 1023788 rs12132100 C T . PASS . GT:GQ 0/0:0.823209
1 1026428 GSA-rs116334314 G A . PASS . GT:GQ 0/0:0.890004
1 1026913 GSA-rs115662838 C T . PASS . GT:GQ 0/0:0.734707
1 1027888 GSA-rs77334480 C T . PASS . GT:GQ 0/0:0.862744
1 1030374 rs12731175 G A . PASS . GT:GQ 0/0:0.945677
1 1031540 rs9651273 A G . PASS . GT:GQ 0/1:0.778961
1 1040026 rs6671356 T C . PASS . GT:GQ 0/0:0.878835
1 1045331 GSA-rs147606383 G A . PASS . GT:GQ 0/0:0.697128
1 1045606 rs12080505 A C . PASS . GT:GQ 0/0:0.332331
1 1054091 GSA-rs61766344 C T . PASS . GT:GQ 0/0:0.734926
1 1062638 rs9442373 C A . PASS . GT:GQ 0/0:0.817336
1 1065296 rs4072537 T C . PASS . GT:GQ 1/1:0.798382
1 1065726 GSA-rs11260598 T C . PASS . GT:GQ 0/0:0.907904
1 1068883 rs61766346 G A . PASS . GT:GQ 0/0:0.922105
1 1070467 rs139475585 G A . PASS . GT:GQ 0/0:0.918811
1 1072181 rs141230226 C T . PASS . GT:GQ 0/0:0.876326
1 1079198 rs11260603 T C . PASS . GT:GQ 0/0:0.794936
1 1079261 GSA-rs116661896 G A . PASS . GT:GQ 0/0:0.755679

The summary printout indicates that every call is a no call in your dataset.

Try this:
vdsNA24143Test97.genotypes_table().select(['v', 's', 'g']).show()

I believe this is happening because your GQ field is invalid according to the VCF 4.2 spec: https://samtools.github.io/hts-specs/VCFv4.2.pdf

GQ should be an integer, but your values are floating-point. Hail 0.1 is pretty much hard-coded for GATK-like VCFs, which has caused problems for other sources of data. We automatically filter certain invalid data arrangements, and this must be one of them.

We’ll be releasing a beta version of Hail 0.2 in ~4-6 weeks. In Hail 0.2, the genotype schema is totally flexible and this should load fine.

You’re correct - they are all no calls:

vdsNA24143Test97.genotypes_table().select([‘v’, ‘s’, ‘g’]).show()
±-------------±--------±---------------+
| v | s | g |
±-------------±--------±---------------+
| Variant | String | Genotype |
±-------------±--------±---------------+
| 1:58814:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:565508:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:567092:T:C | NA24143 | ./.:.:.:.:PL=. |
| 1:726912:A:G | NA24143 | ./.:.:.:.:PL=. |
| 1:727841:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:752721:A:G | NA24143 | ./.:.:.:.:PL=. |
| 1:756268:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:759036:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:794332:G:A | NA24143 | ./.:.:.:.:PL=. |
| 1:801536:T:G | NA24143 | ./.:.:.:.:PL=. |
±-------------±--------±---------------+
showing top 10 rows

Thank you for the insight - looking forward to Hail 0.2 then :slightly_smiling_face:

We’ll be posting on this forum when the new version is ready for community use. It’s a lot better than 0.1, so I’m excited too!

1 Like

Hi tpoterba & Hail Team - Is there an updated timeframe for the Hail 0.2 release?

I think we’re nervous about heavily advertising right now, because we have a lot to do before the official stable 0.2 release (months away), but the 0.2 beta version is definitely ready for use! Most Broad users have moved over to it now.

The docs are here: https://www.hail.is/docs/devel/

The 0.2 tutorial is a good place to start, as well as the overview page (thanks, Jackie!). Let us know here or gitter if you have questions!