Hi All,
I am currently trying to annotate BioBank data. The data itself is split chromosome wise, i.e, there are PLINK files for each chromosome. I am trying to consolidate a chromosome’s data, and then VEP annotate it and then perform some further downstream analyses with this annotated chromosomal data. I am facing a strange issue here though. The issue here is regarding the gene_symbol field that is being produced by VEP. The scenario here is that VEP is annotating variants at particular loci with a gene_symbol, but those loci are not even associated with any genes. Is this something anyone has faced before? I need legitimate gene_symbols associated with every variant present in my data, or NaN if the variant is at a loci which is not associated with any gene. Is there another annotating service I can use for this? The goal here is to essentially annotate every variant with the gene symbol associated with the variant/loci as per Ensembl (or NaN if no gene is associated), but I am finding cases where annotations for a gene extends beyond Ensembl’s range of the gene significantly, and, according to Ensembl, these wrongly annotated regions are not even part of any gene.