Simple rare variant burden testing with Fisher exact test

Mitja · May 12, 2017, 7:10pm

Hail has a general way of performing burden tests within regression framework.
Logistic regression burden tests.

However in rare variant analysis the counts are often so low that some regression methods might not even converge and Fisher’s exact test can be used. I wrote a simple Python function that does just that. Given filtered vds, grouping variant annotation and a list of binary phenotypes it does a case control burden test for each phenotype and returns a keytable with all comparisons as well as carrier and variant id’s.

def do_burden(vds, pheno_list,group_by_anno, pheno_root="sa.pheno"):
    """
     Does a Fisher's burden test counting each individual only once for a variant for each element of group_by_anno.
     For doing e.g. LoF filter  first to lofs and call the function with filtered VDS.
    :param vds:
    :param pheno_list:  list of binary phenotypes
    :param group_by_anno: variant annotation to group by e.g. va.gene
    :return: keytable with all comparison columns and carrier and variant id's
    """
    res = []
    for pheno in pheno_list:
        p = pheno.split(".")
        p = p[ -1 ]
        g_kt = vds.aggregate_by_key( 'group=' +  group_by_anno + ', sample=s, pheno=' + pheno_root + "." + pheno,
            'carrier= g.map( g =>  if( g.isCalledNonRef ) 1 else 0 ).collect().max, vars= if (g.filter( g=> g.isCalledNonRef).count()>0) s.map(s=> s).collect()[0] + "[" + g.filter( g=> g.isCalledNonRef).map( g=> v.contig + ":" + v.start + ":" +  v.ref + ":" + v.alt  ).collect().mkString(",") + "]" else NA:String   ')
        case_carriers = g_kt.filter('pheno').aggregate_by_key(' group=group',
                                                              p +'_case_carriers = carrier.collect().sum, ' + p +'_case_n=carrier.count(),' + p +'_case_vars= vars.filter( v=> !isMissing(v) ).collect().mkString(";")' )
        control_carriers = g_kt.filter("!pheno").aggregate_by_key(' group=group',
                                                                  p +'_control_carriers = carrier.collect().sum, ' + p +'_control_n=carrier.count(),' + p + '_control_vars= vars.filter( v=> !isMissing(v) ).collect().mkString(";")')
        case_carriers = case_carriers.select(["group", p + "_case_carriers", p +"_case_n",  p +"_case_vars"]).key_by(["group"])
        control_carriers = control_carriers.select(["group",  p + "_control_carriers", p + "_control_n",p +"_control_vars"]).key_by(["group"])
        burdens = case_carriers.join(control_carriers, how="outer")
        b = burdens.annotate( p + '_assoc=fet( ' + p + '_case_carriers.toInt, ('+ p +'_case_n - '+ p +'_case_carriers).toInt,'+ p +'_control_carriers.toInt, ('+ p +'_control_n - '+ p +'_control_carriers).toInt) ')
        res.append( b)

    if len(pheno_list)> 1:
        res = reduce( lambda x, y:  x.join(y, how="outer"), res )
    else:
        res = res[0]

    return res.flatten()

Topic		Replies	Views
Logistic regression burden tests Updates	4	3132	September 12, 2018
Linear regression burden tests, collapsing genotypes by variant key Updates	1	1655	October 4, 2018
Added Firth logistic regression Updates	1	2935	April 1, 2017
Exporting intermediate components of score test Feature Requests	3	787	December 17, 2018
Sample wise VEP annotation for Rare Variant Disease(Exome) vcf files through hail with my custom databases Hail Query & hailctl	9	356	June 12, 2023

Simple rare variant burden testing with Fisher exact test

Related topics