Annotating nested structs based on the struct field value

souckmi · August 21, 2023, 9:38am

Hello,
I have a MatrixTable of variants with rows annotated with Nirvana (previously with VEP, the resulting structure is the same, so it could be easier to use as an example here).

VEP annotates each row with an array of transcript consequences. My goal is to add transcript and gene start and stop positions to each transcript consequence, based on the transcript id and gene id in the transcript consequence struct.

I use a map function to create a new transcript consequence only with the needed values and to calculate some new fields as well. A minimal example below:

mt = mt.annotate_rows(new_transcript_consequences=hl.map(lambda x:
                                          hl.struct(transcript_id=x.transcript_id,
                                                    gene_id=x.gene_id,
                                                    gene_symbol=x.gene_symbol,
                                                    ...), 
                                               mt['vep']['transcript_consequences'])

I know you include Gencode in your experimental functions, but I was not able to figure out how to leverage those, so I imported my own Gencode dff with dffutils python package and wanted to use that in the map function. Something like:

mt = mt.annotate_rows(new_transcript_consequences=hl.map(lambda x:
                                          hl.struct(transcript_id=x.transcript_id,
                                                    transcript_start=gencode_db[x.transcript_id].start,
                                                    ...

That of course does not work and I guess it is a really naive approach since I still struggle to fully understand Hail and the expressions.

So my question is what is the right intended approach that I should choose? Is there a Hail build in functionality that I overlooked that could help, or how can I use the Gencode (or anything else) to annotate each struct of the transcript consequence based on the gene and transcript id?

Thank you for any advise!

Topic		Replies	Views
Annotate MatrixTable from complex JSON (Nirvana) Hail Query & hailctl	1	395	February 20, 2025
Revert explode or nested group_by Hail Query & hailctl	1	398	June 24, 2021
Accessing fields in structure type in Hail 0.2 Hail Query & hailctl	1	553	August 26, 2019
Issues with annotating a MT with a HT Hail Query & hailctl	1	318	June 3, 2022
Variant annotation in MatrixTable Hail Query & hailctl	13	791	July 3, 2020

Annotating nested structs based on the struct field value

Related topics