Vep config hailctl dataproc

Hello,

I am attempting to create a customized config file for vep to run from a dataproc cluster created using hailctl.

The example config file given in the documentation at: Hail | Genetics works for me, but I would like to change the annotation to include a higher level field for gene_symbol (right now it is nested under transcript_consequences and returning NA values for most variants) and a field for all of the possible consequences of the variant. These are the --symbol and --summary flags in vep and mutually exclusive with the --everything flag used in this example.

How do I need to change the config file to make these changes?
Thank you!

1 Like

I think you’ll have an easier time extracting the gene_symbol afterwards:

ht = hl.vep(...)
ht = ht.annotate(
    gene_symbols = hl.set(ht.transcript_consequences.map(
        lambda tc: tc.gene_symbol
    ).filter(lambda x: hl.is_defined(x)))
)

Note that a given variant can reside in zero or more genes.

Thanks for the tip! I have tried this, but I also need to add fields to the vep annotation. Is there a way to request additional flags using the config file?
Thanks!

The config file is for describing the output of VEP, unless you plan to change the VEP executable to produce a different kind of output, then you shouldn’t modify the config. EDIT: I see there are some flags. I’m not sure how to use those, but I recommend doing the following.

If you want the annotation to be under the vep annotation, do this:

ht = hl.vep(...)
ht = ht.annotate(
    vep = ht.vep.annotate(
        gene_symbols = hl.set(ht.transcript_consequences.map(
            lambda tc: tc.gene_symbol
        ).filter(lambda x: hl.is_defined(x)))
    )
)

Ah, okay. Thank you!