I have a MatrixTable of variants with rows annotated with Nirvana (previously with VEP, the resulting structure is the same, so it could be easier to use as an example here).
VEP annotates each row with an array of transcript consequences. My goal is to add transcript and gene start and stop positions to each transcript consequence, based on the transcript id and gene id in the transcript consequence struct.
I use a map function to create a new transcript consequence only with the needed values and to calculate some new fields as well. A minimal example below:
mt = mt.annotate_rows(new_transcript_consequences=hl.map(lambda x: hl.struct(transcript_id=x.transcript_id, gene_id=x.gene_id, gene_symbol=x.gene_symbol, ...), mt['vep']['transcript_consequences'])
I know you include Gencode in your experimental functions, but I was not able to figure out how to leverage those, so I imported my own Gencode dff with dffutils python package and wanted to use that in the map function. Something like:
mt = mt.annotate_rows(new_transcript_consequences=hl.map(lambda x: hl.struct(transcript_id=x.transcript_id, transcript_start=gencode_db[x.transcript_id].start, ...
That of course does not work and I guess it is a really naive approach since I still struggle to fully understand Hail and the expressions.
So my question is what is the right intended approach that I should choose? Is there a Hail build in functionality that I overlooked that could help, or how can I use the Gencode (or anything else) to annotate each struct of the transcript consequence based on the gene and transcript id?
Thank you for any advise!