Variant qc AF field index meaning


I was looking over the GWAS tutorial and was wondering what the [1] index means in

mt = mt.filter_rows(mt.variant_qc.AF[1] > 0.01)


The AF produced by variant_qc is an array with “one element per allele, including the reference”. AF[1] selects the allele frequency corresponding to the first alternate allele (AF[0] would be the reference). The dataset used in the tutorial is comprised of only biallelic variants, so we know this is the only alternate allele.

@tpoterba how would one run something similar but when AF isn’t defined for every entry:

I tried:

chip_vars_mt = chip_vars_mt.annotate_cols(test = 
                                          hl.agg.filter(hl.is_defined(chip_vars_mt.AF), hl.agg.mean(chip_vars_mt.AF[1])))

But get the following error:

Hail version: 0.2.34-914bd8a10ca2
Error summary: HailException: array index out of bounds: index=1, length=1
Python traceback:
  File "<ipython-input-131-84c67a4862a4>", line 2, in <module>
    hl.agg.filter(hl.is_defined(chip_vars_mt.AF), hl.agg.mean(chip_vars_mt.AF[1])))


missing values propagate in Hail – if AF is missing, AF[1] is just going to be missing too. The problem is that AF is length 1, and you’re trying to get the second element.


The solution I needed:

chip_vars_mt = chip_vars_mt.annotate_cols(mean_clone_size_by_col = 
                                          hl.agg.filter(chip_vars_mt.AF.length() > 0, hl.agg.mean(chip_vars_mt.AF[0])))

I was wrong about which entry was needed