Hi all, is there a quick way to calculate PVE (percent variance explained) and include it in the result table, besides locus, alleles, beta, p-value, etc?
Should be able to write a result.annotate
statement to compute it from the existing statistics:
2*(beta^2)*MAF*(1-MAF)]/[2*(beta^2)*MAF(1-MAF)+((se(beta))^2)*2*N*MAF*(1-MAF)]
Finally did stg like:
result_lm1 = hl.linear_mixed_regression_rows(data_model1.GT.n_alt_alleles(), lm1)
maf_table = data_model1.annotate_rows(MAF = hl.min(data_model1.variant_qc.AF)).rows()
result_lm1_ann = result_lm1.annotate(MAF = maf_table[result_lm1_ann.locus result_lm1_ann.alleles].MAF)
etc, but looks like it takes ages to annotate results with the MAF. Am I missing something here?
Hail is lazy which means that we build a list of the operations and do not execute it until you “observe” the value. You can observe values with count
, write
, hl.export_vcf
, aggregate
, etc.
Because of that, it’s hard to comment on the performance without seeing the full script. Can you share it?
This also means that if you split a matrix table like:
mt1 = mt.anntoate(...)
mt2 = mt.annotate(...)
And then try to join them:
mt1 = mt1.annotate_rows(... mt2.rows()[mt1.row_key])
You’ll do all the work to produce mt
twice. It looks like you might have this pattern. hl.linear_mixed_regression_rows
has the pass_through
argument to help address this issue.
Finally, Hail’s hl.linear_mixed_regression_rows
is not as fast as other LMM methods like BOLTLMM or SAIGE. Have you tried using those instead?
Thanks, I think the pass_through
argument is all I need, did not notice that! I can calculate MAF at the start, use the argument to include it in the result table, and then use annotate_rows
. Also, I’m impatient, it did not take ages, after all.