I am new to VCF file formats and bioinformatics. I already referred the post here but I guess the solution was discussed in
Was following few tutorials online and found out that VCF file has to be annotated to perform GWAS analysis.
As I have VCF files with us, can I know how can I add annotation information to those files?
q1) From where can I get the annotation information. I know there are multiple resources. Can I request the veterans here to help me to understand when to use which resource?
q2) How to link the annotation information to VCF file? How can I do that using
hail. I see in hail tutorial you make use of text file.
q3) How do I know whether the same text file that tutorial used can be useful for our VCF file as well?
I’ll answer each question individually.
This is a scientific question, not a technical one, so it’s not exactly in our wheelhouse. Many Hail users use the Variant Effect Predictor (VEP) through hail with hl.vep. To use this easily, you’ll need to be running on Google Cloud.
It depends what form the annotations are in. If they’re a text file, it’s easy to import that with
hl.import_table and join. I’ll answer when we have specifics.
The file with the tutorial data is fake data, don’t use that.
@tpoterba - One follow up question. Let’s say I have installed
hail installed in our ubuntu server instead of google cloud, is it difficult to use
hl.vep for annotating our vcf files?
Is there any step-step hail tutorial on VCF file annotation using VEP ?
VEP is notoriously difficult to install, and you’d have to set up all the executables and databases to work with Hail in a specific way. This is an activity that will be pretty difficult, to the point where it’s something we don’t really have the bandwidth to support.
If you’re using Google Dataproc or if you’ve set up the local installation correctly, you just run:
mt = hl.vep(mt)