How to parse CSQ (VEP) field inside Hail 0.2?

Rost · December 21, 2019, 9:22pm

Forgive me for the long post.
I want to calculate MAC grouped_by gene (mt.sample_qc.n_singleton()) for several subsets of variants and filtering, I must use specific field (for example “Consequence”, “CANONICAL” and “Gene”) for canonical transcripts. How can I do it inside Hail 0.2? I have tried to hl.annotate_cols() with extracting fields from transcripts, that had passed mt.info.CSQ.filter(), but did not succeed at every stage.
This is a casual string of CSQ (VEP-annotation) for each transcript per variant:

[“C|missense_variant|MODERATE|AGRN|ENSG00000188157|Transcript|ENST0000037…”]

This is my hardcoding logic, that did not fit:

consequence = 1
canonical = 23
gene = 4
correct_transcript = hl.eval(mt.info.CSQ.filter(lambda x: all([hl.eval(hl.eval(x.split(‘|’))[consequence]) == ‘missense_variant’, hl.eval(hl.eval(x.split(‘|’))[canonical]) == ‘YES’)))
mt = annotate_rows(gene_name = hl.eval(correct_transcript[gene]))
hl.agg.group_by(mt.gene_name, hl.agg.sum(mt.sample_qc.n_singleton))

mt.describe()
----------------------------------------
Row fields:
…
‘info’: struct {
…
CSQ: array<str>
}
----------------------------------------
mt.info.CSQ
<ArrayExpression of type array<str>>

The first issue is that it is not clear how to handle the ArrayExpression of StringExpression's.
How to split in this case StringExpression (I suppose) and extract special field by a separator from a string.
It’s not clear how to apply the ArrayExpression.filter() or the alternative way in this case.
Should I create an additional variable (hl.annotate_rows()) with values from parsed CSQ-field necessarily?
How to exclude variants from dataset if correct_transcript doesn’t pass the condition-filter (None or empty [])?
Can I use hl.eval() for a sequence of conversions or I should hardcode nested hl.eval(hl.eval(...)) for each operation?
Maybe it can be done somehow easier?

tpoterba · December 23, 2019, 1:53pm

Dealing with VEP schemas is one of the most difficult problems to solve in Hail (or outside of Hail, I’m sure) so you should not be concerned about having trouble with this!

One of the first points to make is that hl.eval is something that won’t appear in most pipelines that process tables or matrix tables – it’s used in our docs to print the results of simple expressions for teaching purposes, and I think that’s probably a bad idea. Instead, you’ll be passing expressions into annotate_rows, which knows how to apply those transformations to every row of a table / MatrixTable.

Here’s my attempt to write the VEP parser you’ve described:

consequence = 1
canonical = 23
gene = 4
split_csq = mt.info.CSQ  # it looks like this is already an array?
split_transcripts = split_csq.map(lambda x: x.split('|'))

kept_transcripts = split_transcripts.filter(
    lambda tx: (tx[consequence] == 'missense_variant') & (tx[canonical] == 'YES'))

mt = mt.annotate_rows(kept_transcripts = kept_transcripts)
# there can be more than one canonical transcript. If you want each to be treated separately, we can use explode. Otherwise, can use `head()`
mt = mt.explode_rows(mt.kept_transcripts) # option 1
mt = mt.annotate_rows(kept_transcripts = mt.kept_transcripts.head())) # option 2

# it looks like you want number of singleton variants per gene. This is just an aggregation of the rows.
mt_rows = mt.rows()
gene_ht = mt_rows.group_by(mt_rows.kept_transcripts.gene) \
    .aggregate(n_singleton = hl.agg.count_where(mt_rows.variant_qc.AC[1] == 1))

Rost · December 23, 2019, 5:52pm

Thank you!

I assume that the “split” of hail supports regex, so we should use escaping of special symbols here:
split_csq.map(lambda x: x.split('\|')).
How to omit empty lists, which I got from split_transcripts.filter() ?

locus	alleles	
locus<GRCh37>	array<str>	array<array<str>>
1:957584	["C","A"]	[]
1:957593	["C","T"]	[["T","missense_variant","MODERATE","AGRN","ENSG00000188157","Transcript"...
1:957604	["G","A"]	[]
1:957624	["G","A"]	[["A","missense_variant","MODERATE","AGRN","ENSG00000188157","Transcript"...
1:957633	["T","C"]	[["C","missense_variant","MODERATE","AGRN","ENSG00000188157","Transcript"...

I suppose mt_rows.kept_transcripts doesn’t contain “gene” field, and I need to annotate_rows() before.

danking · December 23, 2019, 6:09pm

You may find the overview a more accessible and broad introduction to Hail than the reference documentation.

Yes, the pipe should be escaped.
Table.filter removes rows of a table, try filtering on hl.len(table.kept_transcripts[0]) > 0
kept_transcipts is a field of a MatrixTable, or, after you call .rows(), a field of a Table, but as you’ve noticed it’s an array of arrays of strings. If these inner arrays have interesting structure rather than being arbitrary length collections of values, then you should consider converting the inner arrays to Hail Structs:

t.annotate(kept_transcripts=t.kept_transcripts.map(lambda x:
    hl.struct(alt=x[0], type=x[1], severity=x[2], gene=x[3], tid=x[4], ...)))

Rost · December 23, 2019, 10:42pm

It’s still not clear for me how I should apply hl.len(table.kept_transcripts[0]) > 0 to ArrayExpression of ArrayExpressions.

# I made a bool-mask for next filtering:
(hl.len(kept_transcripts) > 0).show(5)

locus	alleles	
locus<GRCh37>	array<str>	bool
1:957624	["G","A"]	false
1:957640	["C","T"]	true
1:957677	["C","T"]	false
1:957742	["C","T"]	true
1:957743	["C","G"]	false

# Even so, I don't know how to apply it. I tried to use your solution,
# but I still get unfiltered data:
kept_transcripts.filter(lambda tx: hl.len(tx) > 0)

locus	alleles	
locus<GRCh37>	array<str>	array<array<str>>
1:957624	["G","A"]	[]
1:957640	["C","T"]	[["T","synonymous_variant","LOW","AGRN","ENSG00000188157","Transcript","E...
1:957677	["C","T"]	[]
1:957742	["C","T"]	[["T","synonymous_variant","LOW","AGRN","ENSG00000188157","Transcript","E...
1:957743	["C","G"]	[]

danking · December 24, 2019, 4:01pm

OK, so, Tim used a somewhat advanced feature of Hail: expressions. I think the subtle difference between expressions and MatrixTables/Tables is causing confusion.

An expression is a hail value or a combination of hail values. Some simple examples:

x = hl.literal(3) # literal converts a python value to a hail value
a_str = hl.literal("abc")
y = x * 2 # python values are automatically converted in many cases
len_of_a_str = hl.len(a_str)

A key aspect of Hail is that expressions can be keyed. An keyed expression is an element of a Table or MatrixTable.

t = hl.utils.range_table(10) # a table with a field idx with values 0 to 9
t # a Table
t.idx # a Table-keyed expression

You can show a Table or an expression. The difference is more clear when a Table has many columns:

import hail as hl
t = hl.utils.range_table(10) # a table with a field idx with values 0 to 9
t = t.annotate(idx_sqr = t.idx * t.idx,
               idx_cube = t.idx * t.idx * t.idx)
t.show()
t.idx.show()
t.idx_sqr.show()
(t.idx_sqr < 9).show()
t.describe()

Run the above. You’ll see that Table.show shows all the fields on a table. Expression.show shows the expressed value and the key, if it exists. t.idx is its own key, so it is printed alone. t.idx_sqr's key is t.idx, so both are printed. Same for the boolean expression.

MatrixTables generalize a Table into two dimensions: there is both a row-key and a column-key. Row-keyed values are called “row fields”. Column-keyed values are called “column fields”. Row and column keyed values are called “entry fields”.

mt = hl.utils.range_matrix_table(3, 3)
mt = mt.annotate_entries(product = mt.row_idx * mt.col_idx,
                         sum = mt.row_idx + mt.col_idx)
mt = mt.annotate_rows(row_idx_sqr = mt.row_idx * mt.row_idx)
mt = mt.annotate_cols(col_idx_sqr = mt.col_idx * mt.col_idx)
mt.show()
mt.col_idx_sqr.show()
mt.row_idx_sqr.show()
mt.product.show()
(mt.product == mt.sum).show()

Run the above. You’ll see again how expressions with a variety of keys are displayed.

Ok, let’s rewrite Tim’s code to avoid saving expressions in python variables. We’ll only save table or matrix tables in python variables. I think this will avoid confusion.

consequence = 1
canonical = 23
gene = 4

mt = mt.annotate_rows(kept_transcripts =
    mt.info.CSQ.map(
        lambda x: x.split('|')
    ).filter(
        lambda tx: (tx[consequence] == 'missense_variant') & (tx[canonical] == 'YES')
    )
)
# there can be more than one canonical transcript. If you want each to be treated separately, we can use explode. Otherwise, can use `head()`
mt = mt.explode_rows(mt.kept_transcripts) # option 1
mt = mt.annotate_rows(kept_transcripts = mt.kept_transcripts.head())) # option 2

# it looks like you want number of singleton variants per gene. This is just an aggregation of the rows.
mt_rows = mt.rows()
gene_ht = mt_rows.group_by(mt_rows.kept_transcripts.gene) \
    .aggregate(n_singleton = hl.agg.count_where(mt_rows.variant_qc.AC[1] == 1))

OK, so, let’s remove the last line and replace it with this:

mt_rows = mt_rows.filter(hl.len(mt_rows.kept_transcripts) >= 1)
mt_rows = mt_rows.annotate(kept_transcripts =
    mt_rows.kept_transcripts.map(
        lambda x: hl.struct(alt=x[0], type=x[1], severity=x[2], gene=x[3], tid=x[4], rest=x[5:])
    )
)

I filter to rows that have at least one transcript. Then I convert each transcript in the kept_transcripts array (a field of the Table mt_rows) to a struct that has some friendly names for its elements. All the elements that I chose not to name I kept as an array and gave the name rest.

Ok, one more thing, there can be multiple transcripts per locus and alleles because genes can overlap. So let’s copy every row for each transcript it has, converting the kept_transcripts array-of-structs field to a struct field.

mt_rows = mt_rows.explode(mt_rows.kept_transcripts)

Now we can group_by:

gene_ht = mt_rows.group_by(mt_rows.kept_transcripts.gene) \
    .aggregate(n_singleton = hl.agg.count_where(mt_rows.variant_qc.AC[1] == 1))

Rost · December 30, 2019, 9:03pm

Thank you! Everything is working fine!
I have a last optional issue.
Can I make one object (mt) which contain several different results of filtering? For example:

kept_transcripts_synonymic
kept_transcripts_missense,
kept_transcripts_LoF

tpoterba · December 31, 2019, 9:57pm

yes, absolutely - you can add multiple lines as shown here:

mt = mt.annotate_rows(kept_transcripts_synonymous =
    mt.info.CSQ.map(
        lambda x: x.split('|')
    ).filter(
        lambda tx: (tx[consequence] == 'synonymous') & (tx[canonical] == 'YES')
    ),
    kept_transcripts_missense =
    mt.info.CSQ.map(
        lambda x: x.split('|')
    ).filter(
        lambda tx: (tx[consequence] == 'synonymous') & (tx[canonical] == 'YES')
    ),
    ...
)

Rost · March 3, 2021, 11:43pm

Hi, Hail!
It’s been a long time since I used such type of annotation (as raw tstr object). Then I learned how to use VEP inside Hail 0.2 and parse annotation as describes in json scheme. And now I need to choose elements from all fields from mt.vep.transcript_consequences according to the canonical transcript id. Is there a natural easy solution for that case?

In [39]: mt.vep.transcript_consequences.canonical.show(3)
+---------------+------------+--------------------------------------------+
| locus         | alleles    | <expr>                                     |
+---------------+------------+--------------------------------------------+
| locus<GRCh37> | array<str> | array<int32>                               |
+---------------+------------+--------------------------------------------+
| 1:904165      | ["G","A"]  | [1,NA,NA,1,NA,NA,NA,NA,NA,1,NA,1]          |
| 1:909917      | ["G","A"]  | [NA,NA,NA,1,1,NA,NA,NA,NA,NA,NA,1,1,NA,NA] |
| 1:986963      | ["C","T"]  | [1,1,NA,NA,NA,NA,NA,1,NA,NA]               |
+---------------+------------+--------------------------------------------+

tpoterba · March 4, 2021, 2:05pm

You can do something like:

canonical_transcript_csq = mt.vep.transcript_consequences\
    .filter(lambda tc: tc.canonical == 1)

And then use that however you need.

Rost · March 4, 2021, 3:38pm

Thanks! That works!

Rost · March 5, 2021, 10:00pm

Well, but what should I do in the case when I need to select one annotation by most_severe_consequence?
I try to do that as you suggest above however, it’s correct only for comparison with int.
Then I decide that I need something kind of which in R and find an index of each needed element:

mt.vep.transcript_consequences.consequence_terms\
      .index(lambda x: x.contains(mt.vep.most_severe_consequence))

How to apply that indicies for each field in mt.vep.transcript_consequences or do that more naturally?

danking · March 6, 2021, 4:45pm

The gnomAD team has written a little function to do just this, as I understand it: gnomad_methods/vep.py at 8da14b2f438b3ca98b65658ad83cb65a449e7c46 · broadinstitute/gnomad_methods · GitHub

shao · February 9, 2023, 3:31pm

I tried to use this to parse the CSQ column using

split_csq.map(lambda x: x.split('\|')).

However, the returned didn’t split by ‘|’

split_csq = mt.info.CSQ
split_transcripts = split_csq.map(lambda x: x.split('|'))
split_transcripts = split_csq.map(lambda x: x.split('|'))
n
split_transcripts.show()
locus
alleles
locus<GRCh38>	array<str>	array<array<str>>
chr1:12672	["C","T"]	[["T","|","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","e","x","o","n","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","1","|","E","N","S","G","0","0","0","0","0","2","2","3","9","7","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5","|","t","r","a","n","s","c","r","i","b","e","d","_","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","3","/","6","|","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5",".","2",":","n",".","1","5","7","C",">","T","|","|","1","5","7","|","|","|","|","|","r","s","1","4","1","9","0","7","2","0","5","0","|","1","|","|","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","7","1","0","2","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","2","0","0","8","|","0",".","0","0","2","9","2","|","0",".","0","0","2","7","7","8","|","0","|","0",".","0","0","5","3","1","9","|","0","|","0",".","0","0","0","4","7","0","8","|","0","|","0",".","0","0","1","7","3","3","|","0",".","0","4","7","9","1","|","0",".","0","7","4","5","1","|","0","|","0",".","0","1","8","5","2","|","0",".","0","4","5","4","5","|","0",".","0","3","7","9","7","|","0","|","0","|","0",".","0","1","5","4","1","|","0",".","1","1","1","1","|","0",".","0","3","6","5","9","|","0",".","1","1","1","1","|","g","n","o","m","A","D","g","_","O","T","H","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","e","x","o","n","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","2","|","E","N","S","G","0","0","0","0","0","2","9","0","8","2","5","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8","|","l","n","c","R","N","A","|","2","/","3","|","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8",".","2",":","n",".","4","1","9","C",">","T","|","|","4","1","9","|","|","|","|","|","r","s","1","4","1","9","0","7","2","0","5","0","|","1","|","|","1","|","|","S","N","V","|","1","|","E","n","t","r","e","z","G","e","n","e","|","|","Y","E","S","|","|","|","1","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","2","0","0","8","|","0",".","0","0","2","9","2","|","0",".","0","0","2","7","7","8","|","0","|","0",".","0","0","5","3","1","9","|","0","|","0",".","0","0","0","4","7","0","8","|","0","|","0",".","0","0","1","7","3","3","|","0",".","0","4","7","9","1","|","0",".","0","7","4","5","1","|","0","|","0",".","0","1","8","5","2","|","0",".","0","4","5","4","5","|","0",".","0","3","7","9","7","|","0","|","0","|","0",".","0","1","5","4","1","|","0",".","1","1","1","1","|","0",".","0","3","6","5","9","|","0",".","1","1","1","1","|","g","n","o","m","A","D","g","_","O","T","H","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","W","A","S","H","7","P","|","E","N","S","G","0","0","0","0","0","2","2","7","2","3","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","8","8","1","4","7","|","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","|","|","|","|","|","|","|","|","r","s","1","4","1","9","0","7","2","0","5","0","|","1","|","1","7","3","2","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","8","0","3","4","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","2","0","0","8","|","0",".","0","0","2","9","2","|","0",".","0","0","2","7","7","8","|","0","|","0",".","0","0","5","3","1","9","|","0","|","0",".","0","0","0","4","7","0","8","|","0","|","0",".","0","0","1","7","3","3","|","0",".","0","4","7","9","1","|","0",".","0","7","4","5","1","|","0","|","0",".","0","1","8","5","2","|","0",".","0","4","5","4","5","|","0",".","0","3","7","9","7","|","0","|","0","|","0",".","0","1","5","4","1","|","0",".","1","1","1","1","|","0",".","0","3","6","5","9","|","0",".","1","1","1","1","|","g","n","o","m","A","D","g","_","O","T","H","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","M","I","R","6","8","5","9","-","1","|","E","N","S","G","0","0","0","0","0","2","7","8","2","6","7","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","6","1","9","2","1","6","|","m","i","R","N","A","|","|","|","|","|","|","|","|","|","|","r","s","1","4","1","9","0","7","2","0","5","0","|","1","|","4","6","9","7","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","5","0","0","3","9","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","2","0","0","8","|","0",".","0","0","2","9","2","|","0",".","0","0","2","7","7","8","|","0","|","0",".","0","0","5","3","1","9","|","0","|","0",".","0","0","0","4","7","0","8","|","0","|","0",".","0","0","1","7","3","3","|","0",".","0","4","7","9","1","|","0",".","0","7","4","5","1","|","0","|","0",".","0","1","8","5","2","|","0",".","0","4","5","4","5","|","0",".","0","3","7","9","7","|","0","|","0","|","0",".","0","1","5","4","1","|","0",".","1","1","1","1","|","0",".","0","3","6","5","9","|","0",".","1","1","1","1","|","g","n","o","m","A","D","g","_","O","T","H","|","|","|","|","|","|","|","|","|","|","|","|","|",""]]
chr1:12755	["G","A"]	[["A","|","i","n","t","r","o","n","_","v","a","r","i","a","n","t","&","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","1","|","E","N","S","G","0","0","0","0","0","2","2","3","9","7","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5","|","t","r","a","n","s","c","r","i","b","e","d","_","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","3","/","5","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5",".","2",":","n",".","1","8","2","+","5","8","G",">","A","|","|","|","|","|","|","|","r","s","1","2","8","3","3","0","2","4","6","9","|","1","|","|","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","7","1","0","2","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","0","4","9","7","2","|","0",".","0","0","1","3","3","5","|","0",".","0","0","0","5","1","0","7","|","0","|","0",".","0","0","0","5","0","6","1","|","0","|","0",".","0","0","0","2","1","2","5","|","0","|","0",".","0","0","0","4","2","6","6","|","0",".","0","1","7","1","3","|","0",".","0","4","1","5","6","|","0","|","0",".","0","0","9","0","7","7","|","0","|","0","|","0","|","0","|","0",".","0","0","0","3","3","|","0",".","0","0","7","0","4","2","|","0",".","0","0","1","3","2","3","|","0",".","0","4","1","5","6","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["A","|","i","n","t","r","o","n","_","v","a","r","i","a","n","t","&","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","2","|","E","N","S","G","0","0","0","0","0","2","9","0","8","2","5","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8","|","l","n","c","R","N","A","|","|","2","/","2","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8",".","2",":","n",".","4","6","8","+","3","4","G",">","A","|","|","|","|","|","|","|","r","s","1","2","8","3","3","0","2","4","6","9","|","1","|","|","1","|","|","S","N","V","|","1","|","E","n","t","r","e","z","G","e","n","e","|","|","Y","E","S","|","|","|","1","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","0","4","9","7","2","|","0",".","0","0","1","3","3","5","|","0",".","0","0","0","5","1","0","7","|","0","|","0",".","0","0","0","5","0","6","1","|","0","|","0",".","0","0","0","2","1","2","5","|","0","|","0",".","0","0","0","4","2","6","6","|","0",".","0","1","7","1","3","|","0",".","0","4","1","5","6","|","0","|","0",".","0","0","9","0","7","7","|","0","|","0","|","0","|","0","|","0",".","0","0","0","3","3","|","0",".","0","0","7","0","4","2","|","0",".","0","0","1","3","2","3","|","0",".","0","4","1","5","6","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["A","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","W","A","S","H","7","P","|","E","N","S","G","0","0","0","0","0","2","2","7","2","3","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","8","8","1","4","7","|","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","|","|","|","|","|","|","|","|","r","s","1","2","8","3","3","0","2","4","6","9","|","1","|","1","6","4","9","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","8","0","3","4","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","0","4","9","7","2","|","0",".","0","0","1","3","3","5","|","0",".","0","0","0","5","1","0","7","|","0","|","0",".","0","0","0","5","0","6","1","|","0","|","0",".","0","0","0","2","1","2","5","|","0","|","0",".","0","0","0","4","2","6","6","|","0",".","0","1","7","1","3","|","0",".","0","4","1","5","6","|","0","|","0",".","0","0","9","0","7","7","|","0","|","0","|","0","|","0","|","0",".","0","0","0","3","3","|","0",".","0","0","7","0","4","2","|","0",".","0","0","1","3","2","3","|","0",".","0","4","1","5","6","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["A","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","M","I","R","6","8","5","9","-","1","|","E","N","S","G","0","0","0","0","0","2","7","8","2","6","7","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","6","1","9","2","1","6","|","m","i","R","N","A","|","|","|","|","|","|","|","|","|","|","r","s","1","2","8","3","3","0","2","4","6","9","|","1","|","4","6","1","4","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","5","0","0","3","9","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","0","0","4","9","7","2","|","0",".","0","0","1","3","3","5","|","0",".","0","0","0","5","1","0","7","|","0","|","0",".","0","0","0","5","0","6","1","|","0","|","0",".","0","0","0","2","1","2","5","|","0","|","0",".","0","0","0","4","2","6","6","|","0",".","0","1","7","1","3","|","0",".","0","4","1","5","6","|","0","|","0",".","0","0","9","0","7","7","|","0","|","0","|","0","|","0","|","0",".","0","0","0","3","3","|","0",".","0","0","7","0","4","2","|","0",".","0","0","1","3","2","3","|","0",".","0","4","1","5","6","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""]]
chr1:12783	["G","A"]	[["A","|","i","n","t","r","o","n","_","v","a","r","i","a","n","t","&","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","1","|","E","N","S","G","0","0","0","0","0","2","2","3","9","7","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5","|","t","r","a","n","s","c","r","i","b","e","d","_","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","3","/","5","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5",".","2",":","n",".","1","8","2","+","8","6","G",">","A","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","4","|","1","|","|","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","7","1","0","2","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","1","4","4","3","|","0",".","0","3","2","3","2","|","0","|","0",".","0","0","8","5","2","3","|","0","|","0",".","0","0","0","4","4","3","7","|","0","|","0","|","0",".","0","0","0","1","8","2","|","0",".","0","1","0","4","5","|","0","|","0",".","0","3","2","3","2","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["A","|","i","n","t","r","o","n","_","v","a","r","i","a","n","t","&","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","2","|","E","N","S","G","0","0","0","0","0","2","9","0","8","2","5","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8","|","l","n","c","R","N","A","|","|","2","/","2","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8",".","2",":","n",".","4","6","8","+","6","2","G",">","A","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","4","|","1","|","|","1","|","|","S","N","V","|","1","|","E","n","t","r","e","z","G","e","n","e","|","|","Y","E","S","|","|","|","1","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","1","4","4","3","|","0",".","0","3","2","3","2","|","0","|","0",".","0","0","8","5","2","3","|","0","|","0",".","0","0","0","4","4","3","7","|","0","|","0","|","0",".","0","0","0","1","8","2","|","0",".","0","1","0","4","5","|","0","|","0",".","0","3","2","3","2","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["A","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","W","A","S","H","7","P","|","E","N","S","G","0","0","0","0","0","2","2","7","2","3","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","8","8","1","4","7","|","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","|","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","4","|","1","|","1","6","2","1","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","8","0","3","4","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","1","4","4","3","|","0",".","0","3","2","3","2","|","0","|","0",".","0","0","8","5","2","3","|","0","|","0",".","0","0","0","4","4","3","7","|","0","|","0","|","0",".","0","0","0","1","8","2","|","0",".","0","1","0","4","5","|","0","|","0",".","0","3","2","3","2","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["A","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","M","I","R","6","8","5","9","-","1","|","E","N","S","G","0","0","0","0","0","2","7","8","2","6","7","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","6","1","9","2","1","6","|","m","i","R","N","A","|","|","|","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","4","|","1","|","4","5","8","6","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","5","0","0","3","9","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","0","1","4","4","3","|","0",".","0","3","2","3","2","|","0","|","0",".","0","0","8","5","2","3","|","0","|","0",".","0","0","0","4","4","3","7","|","0","|","0","|","0",".","0","0","0","1","8","2","|","0",".","0","1","0","4","5","|","0","|","0",".","0","3","2","3","2","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""]]
chr1:12807	["C","T"]	[["T","|","i","n","t","r","o","n","_","v","a","r","i","a","n","t","&","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","1","|","E","N","S","G","0","0","0","0","0","2","2","3","9","7","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5","|","t","r","a","n","s","c","r","i","b","e","d","_","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","3","/","5","|","E","N","S","T","0","0","0","0","0","4","5","0","3","0","5",".","2",":","n",".","1","8","2","+","1","1","0","C",">","T","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","5","|","1","|","|","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","7","1","0","2","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","2","1","2","4","|","0",".","3","1","2","4","|","0",".","0","8","5","2","3","|","0",".","1","6","0","6","|","0",".","1","9","3","5","|","0",".","1","8","3","4","|","0",".","2","8","1","1","|","0",".","1","4","5","8","|","0",".","1","1","5","|","0",".","2","1","2","8","|","0",".","1","8","3","7","|","0",".","3","1","2","4","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","i","n","t","r","o","n","_","v","a","r","i","a","n","t","&","n","o","n","_","c","o","d","i","n","g","_","t","r","a","n","s","c","r","i","p","t","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","D","D","X","1","1","L","2","|","E","N","S","G","0","0","0","0","0","2","9","0","8","2","5","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8","|","l","n","c","R","N","A","|","|","2","/","2","|","E","N","S","T","0","0","0","0","0","4","5","6","3","2","8",".","2",":","n",".","4","6","8","+","8","6","C",">","T","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","5","|","1","|","|","1","|","|","S","N","V","|","1","|","E","n","t","r","e","z","G","e","n","e","|","|","Y","E","S","|","|","|","1","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","2","1","2","4","|","0",".","3","1","2","4","|","0",".","0","8","5","2","3","|","0",".","1","6","0","6","|","0",".","1","9","3","5","|","0",".","1","8","3","4","|","0",".","2","8","1","1","|","0",".","1","4","5","8","|","0",".","1","1","5","|","0",".","2","1","2","8","|","0",".","1","8","3","7","|","0",".","3","1","2","4","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","W","A","S","H","7","P","|","E","N","S","G","0","0","0","0","0","2","2","7","2","3","2","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","4","8","8","1","4","7","|","u","n","p","r","o","c","e","s","s","e","d","_","p","s","e","u","d","o","g","e","n","e","|","|","|","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","5","|","1","|","1","5","9","7","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","3","8","0","3","4","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","2","1","2","4","|","0",".","3","1","2","4","|","0",".","0","8","5","2","3","|","0",".","1","6","0","6","|","0",".","1","9","3","5","|","0",".","1","8","3","4","|","0",".","2","8","1","1","|","0",".","1","4","5","8","|","0",".","1","1","5","|","0",".","2","1","2","8","|","0",".","1","8","3","7","|","0",".","3","1","2","4","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","d","o","w","n","s","t","r","e","a","m","_","g","e","n","e","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","M","I","R","6","8","5","9","-","1","|","E","N","S","G","0","0","0","0","0","2","7","8","2","6","7","|","T","r","a","n","s","c","r","i","p","t","|","E","N","S","T","0","0","0","0","0","6","1","9","2","1","6","|","m","i","R","N","A","|","|","|","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","5","|","1","|","4","5","6","2","|","-","1","|","|","S","N","V","|","1","|","H","G","N","C","|","H","G","N","C",":","5","0","0","3","9","|","Y","E","S","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","2","1","2","4","|","0",".","3","1","2","4","|","0",".","0","8","5","2","3","|","0",".","1","6","0","6","|","0",".","1","9","3","5","|","0",".","1","8","3","4","|","0",".","2","8","1","1","|","0",".","1","4","5","8","|","0",".","1","1","5","|","0",".","2","1","2","8","|","0",".","1","8","3","7","|","0",".","3","1","2","4","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""],["T","|","r","e","g","u","l","a","t","o","r","y","_","r","e","g","i","o","n","_","v","a","r","i","a","n","t","|","M","O","D","I","F","I","E","R","|","|","|","R","e","g","u","l","a","t","o","r","y","F","e","a","t","u","r","e","|","E","N","S","R","0","0","0","0","1","1","6","4","7","4","5","|","e","n","h","a","n","c","e","r","|","|","|","|","|","|","|","|","|","|","r","s","6","2","6","3","5","2","8","5","|","1","|","|","|","|","S","N","V","|","1","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","|","0",".","2","1","2","4","|","0",".","3","1","2","4","|","0",".","0","8","5","2","3","|","0",".","1","6","0","6","|","0",".","1","9","3","5","|","0",".","1","8","3","4","|","0",".","2","8","1","1","|","0",".","1","4","5","8","|","0",".","1","1","5","|","0",".","2","1","2","8","|","0",".","1","8","3","7","|","0",".","3","1","2","4","|","g","n","o","m","A","D","g","_","A","F","R","|","|","|","|","|","|","|","|","|","|","|","|","|",""]]
chr1:12894	["G","T"]

danking · February 9, 2023, 7:39pm

In your first code snippet you wrote

split_csq.map(lambda x: x.split('\|'))

But your second code snippet you wrote

split_transcripts = split_csq.map(lambda x: x.split('|'))

The former is correct, the latter is incorrect. Can you try the former?

shao · February 9, 2023, 8:14pm

Ah my bad! Yes, the former worked, thanks so much!

shao · February 14, 2023, 7:59pm

@danking
I would like to annotate each of the fields in consequence as a specific term for the matrix, but when I tried it doesn’t split the array of strings to grab what is at array index 2

mt = mt.annotate_rows(CSQ = split_transcripts) mt = mt.annotate_rows(consequence = mt.CSQ[2])

danking · February 14, 2023, 9:24pm

I think you’re confused by nested arrays. The CSQ field is an array of strings.

You could grab the second element of every split CSQ field:

CSQ2s = mt.CSQ.map(lambda x: x[2])

Is that what you want?

Topic		Replies	Views
Import existing VEP annotations from vcf or CSQ Hail Query & hailctl	10	1490	November 27, 2019
Filtering Transcripts in Hail 0.2 Hail Query & hailctl	1	443	December 16, 2020
Best strategy for annotating and filtering VCF files using HAIL-VEP on UKB RAP? Hail Query & hailctl	3	1367	June 17, 2022
Compound hets and array<str> to list help Hail Query & hailctl	2	545	May 12, 2020
VEP: add field hgvsg Hail Query & hailctl	2	663	November 18, 2019

How to parse CSQ (VEP) field inside Hail 0.2?

Related topics