Runs of homozygosity

Hi hail team!

It would be super helpful for QC to have a metric that measures contiguous runs of homozgyosity. This would be useful in our relatedness inference to differentiate between technical artifacts and samples with potentially higher levels of consanguinity, and it’d be interesting to know in the rare disease space (quickly find samples with evidence of uniparental disomy).

Let me know if you need any more details to consider this request, and thank you all for your hard work!

Is there a definition of this metric that you can point us to? I bet we can implement this with a scan and aggregation in Python.

I started doing a bit of look into methods. plink --homozyg is a tried and true scanning window method but looks like there are some other options, including bcftools (which looks like an HMM) and this R package. I’m not sure if there’s a comparison anywhere between these