My name is Krishna, and I am currently working as a project student in the field of Clinical Genetics, specifically the rare variant disease study of patient exome and whole genome samples.
I have successfully generated VCF files for each patient sample following the GATK best practices workflow, which includes adapter marking, alignment, duplicate marking, base quality score recalibration (BQSR), HaplotypeCaller, and hard filtering.
I am able to annotate these VCF files individually using the VEP docker container with custom databases available, such as Gnomad exomes, Gnomad genomes, Cosmic Mutations, dbSNP, CADD, and others. These databases are relevant for both Grch37 (cache v106) and Grch38 (cache v106) reference genomes.
Now I am interested in annotating these VCF files using the VEP docker container (Variant Effect Predictor) through Hail for scalability purposes.
Although I have reviewed the documentation on using VEP with Hail, I found it difficult to understand and it left me confused. I kindly request your assistance in clarifying the process and guiding me through it.
Thank you in advance for your help.