I wonder how to filter out small regions which
contain multiple variants which are likely
false positives (i.e. after homopolymer stretches).
Is it possible to use hl.linalg.utils.locus_windows(ds.locus,3)
for this purpose? Or do you recommend another approach?
If I use locus_windows in the following way the
window length is a numyp.array.
tmp = hl.linalg.utils.locus_windows(ds.locus,3)
tmp = tmp[1] - tmp[0]
In [34]: type(tmp)
Out[34]: numpy.ndarray
How can I annotate the rows of a hail.matrixtable
with these values?
Tim