PVE calculation

esebesty · February 4, 2022, 9:10pm

Hi all, is there a quick way to calculate PVE (percent variance explained) and include it in the result table, besides locus, alleles, beta, p-value, etc?

tpoterba · February 4, 2022, 9:14pm

Should be able to write a result.annotate statement to compute it from the existing statistics:

2*(beta^2)*MAF*(1-MAF)]/[2*(beta^2)*MAF(1-MAF)+((se(beta))^2)*2*N*MAF*(1-MAF)]

source

esebesty · March 2, 2022, 9:22pm

Finally did stg like:

result_lm1  = hl.linear_mixed_regression_rows(data_model1.GT.n_alt_alleles(), lm1)
maf_table = data_model1.annotate_rows(MAF = hl.min(data_model1.variant_qc.AF)).rows()
result_lm1_ann = result_lm1.annotate(MAF = maf_table[result_lm1_ann.locus result_lm1_ann.alleles].MAF)

etc, but looks like it takes ages to annotate results with the MAF. Am I missing something here?

danking · March 2, 2022, 10:34pm

Hail is lazy which means that we build a list of the operations and do not execute it until you “observe” the value. You can observe values with count, write, hl.export_vcf, aggregate, etc.

Because of that, it’s hard to comment on the performance without seeing the full script. Can you share it?

This also means that if you split a matrix table like:

mt1 = mt.anntoate(...)
mt2 = mt.annotate(...)

And then try to join them:

mt1 = mt1.annotate_rows(... mt2.rows()[mt1.row_key])

You’ll do all the work to produce mt twice. It looks like you might have this pattern. hl.linear_mixed_regression_rows has the pass_through argument to help address this issue.

Finally, Hail’s hl.linear_mixed_regression_rows is not as fast as other LMM methods like BOLTLMM or SAIGE. Have you tried using those instead?

esebesty · March 3, 2022, 4:50pm

Thanks, I think the pass_through argument is all I need, did not notice that! I can calculate MAF at the start, use the argument to include it in the result table, and then use annotate_rows. Also, I’m impatient, it did not take ages, after all.

Topic		Replies	Views
Calculating MAF Manually Hail Query & hailctl	8	1264	October 26, 2020
Export VCF taking a long time, even when running in parallel Hail Query & hailctl	3	455	December 5, 2023
Issues with sample and variant QC by group Hail Query & hailctl	9	1167	May 14, 2020
Linear regression hanging - help needed Hail Query & hailctl	9	444	August 7, 2023
Trying to annotate vcf subset and then filter according to properties Hail Query & hailctl	9	54	March 21, 2025

PVE calculation

Related topics