Identity by Descent Estimation

The following is the proposed documentation for the identity-by-descent estimation command. I’m not sure how to expose the output. A sample by sample matrix is very large; therefore, we probably should not store it in the global annotations. If the data was shared by sample, we could store each row with its sample.

I’m not yet familiar with the vocabulary of the community, so someone should could check that I’ve appropriately warned about bi-allelic data and LD pruning.


ibd

Compute an estimation of identity-by-descent for each pair of samples. Conceptually, this command’s output is a symmetric, sample-by-sample matrix. The implementation is based on the IBD algorithm described in the PLINK paper.

This command assumes the dataset is bi-allelic. This command does not perform LD pruning but linkage disequilibrium may negatively influence the results.

Usage

  • -m | --minor-allele-frequency <expr>—a hail language expression for the minor allele frequency of the given variant, v. You may also access the variant annotations va. The expression is evaluated once for each variant. If no expression is given, the minor allele frequency is calculated from the data set.

Examples

... ibd --minor-allele-frequency 'va.mafs[v]'

If the --minor-allele-frequency expression evaluates to NA, I currently trigger an error and ask the user to fix the expression. Is this a sensible response?