It seems quite easy to filter individuals by the presence or absence of a single variant, but I haven’t found a way to do the same for haplotypes.
I tried to create a two-snp haplotype-based annotation on my samples, but I couldn’t find a simple way of doing this with a logic statement.
Here’s what I tried, but it doesn’t seem possible to get to a specific set of entries for an individual.
Is there an intuitive solution to this problem that I’m missing? (There is of course the option to do a series of filters to achieve the same result. Or maybe setting bitwise flags on the entries…)
Doing this kind of lookup-and-aggregate-at-the-same-time is a somewhat tricky thing to implement in a nice way in a distributed setting, but you can get around that by doing something like the following:
Your solution above with bitwise sum is also a super clever way to do it – you could do something like annotate each locus with locus_weight = 10 and 20 and then sum hl.agg.sum(mt.GT.n_alt_alleles() * mt.locus_weight) to avoid needing to write so many match statements.