Fixed overflow bug in 0.2 chi_squared_test / contingency_table_test, present since June 25, 2018

jbloom · August 22, 2018, 7:51pm

We just discovered and fixed an overflow bug in chi_squared_test that has existed in the 0.2 development branch since commit 01875754aa921fab22e1c4e8f11949e21fdc884c merged on June 25, 2018.

If you used a version since that commit to run chi_squared_test with counts a, b, c, and d then:

if a * d or b * c exceeded 2,147,483,647, the odds ratio is wrong. E.g. a minimal table with incorrect odds ratio is (46341, 1, 1, 46341).
if (a + b) * (c + d) * (b + d) * (a + c) exceeded 2,147,483,647, the p-value is wrong. E.g., two minimal tables with incorrect p-value are (108, 108, 108, 107) and (216, 0, 0, 215).

Tables with total count below 431 were unaffected.

The bug also applied to contingency_table_test when the total cell count was at least min_cell_count, since in this case the function calls chi_squared_test.

The fisher_exact_test is unrelated and unaffected.

Here is the problematic Scala implementation:

    val ad = a * d
    val bc = (b * c).toDouble
    val oddsRatio = ad / bc
    val det = ad - bc
    val chiSquare = (det * det * (a + b + c + d)) / ((a + b) * (c + d) * (b + d) * (a + c))
    val pValue = chiSquaredTail(chiSquare, 1)

Topic		Replies	Views
Assertion failed Hail Query & hailctl	8	1398	July 8, 2019
[Breaking Change] Rename of methods/fields: ctt, chisq, hardy_weinberg, hardy_weinberg_p, variant_qc, transition_disequilibrium_test Updates	4	775	July 31, 2018
Possible incorrect linreg aggregator results in 0.2.29 - 0.2.37 Updates	2	920	April 17, 2020
Importing VCFs where GT is 0/2 but AD has only 2 entries Help [0.1]	3	1252	June 9, 2017
Inconsistent per sample QC result Hail Query & hailctl	3	397	March 15, 2022

Fixed overflow bug in 0.2 chi_squared_test / contingency_table_test, present since June 25, 2018

Related topics