Annotate variants using amino acid position intervals

ERees · November 3, 2017, 3:46pm

Hi, I’d like to annotate variants based on amino acid coordinates, for example if I have the following table

AA_start	AA_end	Gene	Domain
179	191	SCN5A	Cytoplasmic
275	401	SCN2A	Pore-loop

I’d like to annotate variants with ‘Domain’, if the amino acid changed (e.g. va.vep.transcript_consequences.protein_start) falls within AA_start and AA_end and ‘Gene’ matches that in the variant annotation.

If you need any more info, please let me know!

Cheers,
Elliott

danking · November 10, 2017, 6:18pm

You can achieve this with a KeyTable.

Suppose you have some vds containing VEP and gene annotations and a KeyTable named domain with the schema you depicted in the table.

# ensure domain is keyed appropriately
domain = domain.key_by(['AA_start', 'AA_end', 'gene']
# extract the variants from a VDS
variants = vds.variant_table()

domain_annotated_variants = (variants.annotate([
  'AA_start = va.vep.transcript_consequences.protein_start',
  'AA_end = va.vep.transcript_consequences.protein_end',
  'gene = va.gene']) 
  # remove irrelevant data
  .drop('va')
  # use the same keys as the domain key table
  .key_by(['AA_start', 'AA_end', 'gene'])
  # cross-reference the two tables with join
  .join(domain)
  # drop irrelevant data again
  .drop(['AA_start', 'AA_end', 'gene'])
)

Now the schema of domain_annotated_variants is: { v: Variant, domain: String }. It’s now easy to bring the domain annotation into the original VDS:

domain_annotated_vds = vds.annotate_variants_table(domain_annotated_variants.key_by('v'))

This extends your variant annotation schema with one field: domain of type String.

NB: I have assumed that there is exactly one domain for each amino-acid-coordinate and gene pair. If that’s not the case you can pass product = True to annotate_variants_table. This changes the type of the variant annotation domain to Array[String] and it now contains every matching domain, instead of an arbitrary matching domain.

Topic		Replies	Views
Add Variant Annotations to a VDS based on a KeyTable Updates	0	856	January 31, 2017
Error during VEP annotation (gene_symbol field) of BioBank data Hail Query & hailctl	0	288	July 11, 2023
Variant annotation in MatrixTable Hail Query & hailctl	13	780	July 3, 2020
Annotate_variants_table using keytable generated from spark dataframe Help [0.1]	2	731	October 8, 2017
Annotate variants with genes Hail Query & hailctl	9	479	May 22, 2023

Annotate variants using amino acid position intervals

Related topics