Variant qc AF field index meaning

Hi

I was looking over the GWAS tutorial and was wondering what the [1] index means in

mt = mt.filter_rows(mt.variant_qc.AF[1] > 0.01)

Thanks

1 Like

The AF produced by variant_qc is an array with “one element per allele, including the reference”. AF[1] selects the allele frequency corresponding to the first alternate allele (AF[0] would be the reference). The dataset used in the tutorial is comprised of only biallelic variants, so we know this is the only alternate allele.

1 Like

@tpoterba how would one run something similar but when AF isn’t defined for every entry:

I tried:

chip_vars_mt = chip_vars_mt.annotate_cols(test = 
                                          hl.agg.filter(hl.is_defined(chip_vars_mt.AF), hl.agg.mean(chip_vars_mt.AF[1])))
chip_vars_mt.col.test.show(1)

But get the following error:

Hail version: 0.2.34-914bd8a10ca2
Error summary: HailException: array index out of bounds: index=1, length=1
----------
Python traceback:
  File "<ipython-input-131-84c67a4862a4>", line 2, in <module>
    hl.agg.filter(hl.is_defined(chip_vars_mt.AF), hl.agg.mean(chip_vars_mt.AF[1])))

Thoughts?

missing values propagate in Hail – if AF is missing, AF[1] is just going to be missing too. The problem is that AF is length 1, and you’re trying to get the second element.

@tpoterba

The solution I needed:

chip_vars_mt = chip_vars_mt.annotate_cols(mean_clone_size_by_col = 
                                          hl.agg.filter(chip_vars_mt.AF.length() > 0, hl.agg.mean(chip_vars_mt.AF[0])))

I was wrong about which entry was needed

I think it would be nice to have this explicitly defined in the variant_qc documentation as well. Looking at the “Rare variant analysis” section of the GWAS tutorial, it was unclear to my why AF[0] was being used to group minor allele frequency bins while AF[1] was used for GWAS.