PVE calculation

Hi all, is there a quick way to calculate PVE (percent variance explained) and include it in the result table, besides locus, alleles, beta, p-value, etc?

Should be able to write a result.annotate statement to compute it from the existing statistics:

2*(beta^2)*MAF*(1-MAF)]/[2*(beta^2)*MAF(1-MAF)+((se(beta))^2)*2*N*MAF*(1-MAF)]

source

Finally did stg like:

result_lm1  = hl.linear_mixed_regression_rows(data_model1.GT.n_alt_alleles(), lm1)
maf_table = data_model1.annotate_rows(MAF = hl.min(data_model1.variant_qc.AF)).rows()
result_lm1_ann = result_lm1.annotate(MAF = maf_table[result_lm1_ann.locus result_lm1_ann.alleles].MAF)

etc, but looks like it takes ages to annotate results with the MAF. Am I missing something here?

Hail is lazy which means that we build a list of the operations and do not execute it until you “observe” the value. You can observe values with count, write, hl.export_vcf, aggregate, etc.

Because of that, it’s hard to comment on the performance without seeing the full script. Can you share it?

This also means that if you split a matrix table like:

mt1 = mt.anntoate(...)
mt2 = mt.annotate(...)

And then try to join them:

mt1 = mt1.annotate_rows(... mt2.rows()[mt1.row_key])

You’ll do all the work to produce mt twice. It looks like you might have this pattern. hl.linear_mixed_regression_rows has the pass_through argument to help address this issue.

Finally, Hail’s hl.linear_mixed_regression_rows is not as fast as other LMM methods like BOLTLMM or SAIGE. Have you tried using those instead?

Thanks, I think the pass_through argument is all I need, did not notice that! I can calculate MAF at the start, use the argument to include it in the result table, and then use annotate_rows. Also, I’m impatient, it did not take ages, after all.

1 Like