Compute imputation quality R2


I am using DS (imputed dosage) in vcf files for GWAS analysis (hl.logistic_regression_rows). I could generate gwas output using the function, .export( file_name ). And these are the header of my output file:
locus alleles AF rsid beta standard_error z_stat p_value fit

I’d like to add a couple of columns in the output file. The first one is imputation quality, R2, [observed dosage variance] / [expected dosage variance]. The second ones are sample sizes of cases and controls. Could you give some example code to compute some operations? Then, I might be able to compute the R2 and sample size.


Do you have the GP field, or just DS? Hail has a built in aggregator (hl.agg.info_score) to calculate imputation metrics from GP.

You can also “pass through” row fields to the resulting table from hl.logistic_regression_rows using the pass_through argument:

mt = mt.annotate_rows(r2 = ???,
                      n_case = hl.agg.count_where(hl.is_defined(mt.DS) & mt.is_case,
                      n_control = hl.agg.count_where(hl.is_defined(mt.DS) & ~mt.is_case)
gwas_result = hl.logistic_regression_rows(x=...,
                                          pass_through=['r2', 'n_case', 'n_control'])