Hi, I started to look into HAIL this week and I am really amazed how fast it is.
I want to calculate the MAC for all of my sites (> 100 million), but I am struggling with this.
I have an INFO/AC entry with is an array with as many entries as there are ALT alleles.
I tried to get the max value like this, but no success:
mt = mt.annotate_rows(MAC = hl.agg.max(mt.info.AC))
What failed about this? I’d expect this to work fine.
Separately, what is the definition of “MAC” at a multiallelic site? Is it the allele count of the alt allele with the most observations? the fewest? The sum of all alts?
The error code I get is the following:
TypeError: max: parameter 'expr': expected expression of type int32 or int64 or float32 or float64, found <ArrayNumericExpression of type array<int32>>
Somehow the function doesn’t expect an array.
My understanding of MAC at multiallelic sites is that it is defined as the count of the alt allele withe the most observations (similar to bcftools (bcftools +fill-tags calculates unexpected MAF for multiallelic variants · Issue #1313 · samtools/bcftools · GitHub).
Ahhh… This is the wrong
There are two max functions in Hail –
hl.max takes the maximum value of an array, while
hl.agg.max is an aggregator that takes the max value along some aggregated axis. If you wanted to take the max value of some entry field (like
GQ) per row, using
hl.agg.max is correct, but here we want
Indeed, it’s working now, thanks!