Hello, beginner in hail here!
From the ‘transcript_consequences’ row of vep row:
transcript_consequences: array<struct {
allele_num: int32,
amino_acids: str,
appris: str,
biotype: str,
cadd_phred: float64,
cadd_raw: float64,
canonical: int32,
ccds: str,
cdna_start: int32,
cdna_end: int32,
cds_end: int32,
cds_start: int32,
codons: str,
consequence_terms: array<str>,
distance: int32,
domains: array<struct {
db: str,
name: str
}>
I just wanted to see the consequence_terms, so I annotated it to a new row:
d = d.annotate_rows(consequence_terms= d.vep.transcript_consequences['consequence_terms'])
Now all the value in my new ‘consequence_terms’ row are an array<array> expression, such as [[‘missense_variant’]].
This is causing me a problem, because I want to know whether these samples have a ‘loss of function’ variant.
I have tried:
LoF_mutation = hl.array(['stop_gained', 'frameshift_variant', 'splice_region_variant', 'splice_acceptor_variant','splice_donor_variant', 'missense_variant'])
d = d.annotate_rows(is_LoF = hl.if_else(LoF_mutation.contains(d.consequent_terms), 'Y', 'N'))
But it shows the error
HailException: no conversion found for contains(, array<str>, array<array<str>>) => bool
.
Is there any way to convert my array<array> values into a more simple expression, such as just array or a list?
It also seems unnecessary, because all my consequent_terms just have two brackets in a row around them.
Thanks in advance