Hi
I was looking over the GWAS tutorial and was wondering what the [1]
index means in
mt = mt.filter_rows(mt.variant_qc.AF[1] > 0.01)
Thanks
Hi
I was looking over the GWAS tutorial and was wondering what the [1]
index means in
mt = mt.filter_rows(mt.variant_qc.AF[1] > 0.01)
Thanks
The AF produced by variant_qc
is an array with “one element per allele, including the reference”. AF[1]
selects the allele frequency corresponding to the first alternate allele (AF[0]
would be the reference). The dataset used in the tutorial is comprised of only biallelic variants, so we know this is the only alternate allele.
@tpoterba how would one run something similar but when AF isn’t defined for every entry:
I tried:
chip_vars_mt = chip_vars_mt.annotate_cols(test =
hl.agg.filter(hl.is_defined(chip_vars_mt.AF), hl.agg.mean(chip_vars_mt.AF[1])))
chip_vars_mt.col.test.show(1)
But get the following error:
Hail version: 0.2.34-914bd8a10ca2
Error summary: HailException: array index out of bounds: index=1, length=1
----------
Python traceback:
File "<ipython-input-131-84c67a4862a4>", line 2, in <module>
hl.agg.filter(hl.is_defined(chip_vars_mt.AF), hl.agg.mean(chip_vars_mt.AF[1])))
Thoughts?
missing values propagate in Hail – if AF is missing, AF[1]
is just going to be missing too. The problem is that AF is length 1, and you’re trying to get the second element.
The solution I needed:
chip_vars_mt = chip_vars_mt.annotate_cols(mean_clone_size_by_col =
hl.agg.filter(chip_vars_mt.AF.length() > 0, hl.agg.mean(chip_vars_mt.AF[0])))
I was wrong about which entry was needed
I think it would be nice to have this explicitly defined in the variant_qc
documentation as well. Looking at the “Rare variant analysis” section of the GWAS tutorial, it was unclear to my why AF[0]
was being used to group minor allele frequency bins while AF[1]
was used for GWAS.