Goal:
I am trying to compare variant types across two datasets and see which one is worth the subscription.
Description:
In order to achieve the above task, I want to know which database contains more variants for a particular gene, which type of variants are present in which database etc.
I have converted vcf files from both databases to hail matrix tables. Generally, we get a field which tells us the type of variant. However, in my case I don’t have any annotations to check whether a particular variant is insertion, deletion, indel etc.
Possible solutions:
- Annotate variants with open source databases for variant type
- Use a programmatic approach to find the variant type
Found these tools which might be of use for insertions/deletions
- GitHub - tseemann/snippy: Rapid haploid variant calling and core genome alignment
- R script taking width into account: MutationalPatterns/get_indel_context.R at master · UMCUGenetics/MutationalPatterns · GitHub
- manta/README.md at master · Illumina/manta · GitHub
Any suggestions on how can I achieve it in hail?