Gnomad_lof_metrics annotations and filtering data

jonlin · December 16, 2020, 9:32am

Hi! I have annotated my data with gnomad_lof_metrics -database, but I have problems when I try to filter my data based on them, mainly because it is a dictionary with a gene as a key and an array as a value (if I understand correctly). I am able to get information on specific genes with

chr.gnomad_lof_metrics.get(‘genenamehere’).pLI.show()

However, I would like to filter my data based on those gnomad-annotations. For example, I would like to filter out genes that have pLI < 0.90. What would be the best way of doing this?
Here is the structure of gnomad_lof_metrics -annotations:

chr.gnomad_lof_metrics.describe()

Type:
dict<str, array<struct {
transcript: str,
obs_mis: int32,
exp_mis: float64,
oe_mis: float64,
mu_mis: float64,
possible_mis: int32,
obs_mis_pphen: int32,
exp_mis_pphen: float64,
oe_mis_pphen: float64,
possible_mis_pphen: int32,
obs_syn: int32,
exp_syn: float64,
oe_syn: float64,
mu_syn: float64,
possible_syn: int32,
obs_lof: int32,
mu_lof: float64,
possible_lof: int32,
exp_lof: float64,
pLI: float64,
pNull: float64,
pRec: float64,
oe_lof: float64,
oe_syn_lower: float64,
oe_syn_upper: float64,
oe_mis_lower: float64,
oe_mis_upper: float64,
oe_lof_lower: float64,
oe_lof_upper: float64,
constraint_flag: str,
syn_z: float64,
mis_z: float64,
lof_z: float64,
oe_lof_upper_rank: int32,
oe_lof_upper_bin: int32,
oe_lof_upper_bin_6: int32,
n_sites: int32,
classic_caf: float64,
max_af: float64,
no_lofs: int32,
obs_het_lof: int32,
obs_hom_lof: int32,
defined: int32,
p: float64,
exp_hom_lof: float64,
classic_caf_afr: float64,
classic_caf_amr: float64,
classic_caf_asj: float64,
classic_caf_eas: float64,
classic_caf_fin: float64,
classic_caf_nfe: float64,
classic_caf_oth: float64,
classic_caf_sas: float64,
p_afr: float64,
p_amr: float64,
p_asj: float64,
p_eas: float64,
p_fin: float64,
p_nfe: float64,
p_oth: float64,
p_sas: float64,
transcript_type: str,
gene_id: str,
transcript_level: int32,
cds_length: int32,
num_coding_exons: int32,
gene_type: str,
gene_length: int32,
exac_pLI: float64,
exac_obs_lof: int32,
exac_exp_lof: float64,
exac_oe_lof: float64,
brain_expression: str,
chromosome: str,
start_position: int32,
end_position: int32
}>>
Source:
<hail.matrixtable.MatrixTable object>
Index:
[‘row’]

Thanks!

danking · December 16, 2020, 4:43pm

You have to decide how to combine the pLI information across (potentially) many overlapping genes and many transcripts in each gene. If you just want to remove variants where at least one transcript in at least one gene has a pLI < 0.90:

all_transcripts = hl.flatten(mt.gonamd_lof_metrics.values())
transcripts_pLI_status = all_transcripts.map(lambda t: t.pLI < 0.9)
mt = mt.filter_rows(hl.any(transcripts_pLI_status))

transcripts_pLI_status is an array of booleans. One boolean for each transcript. The value is True when pLI for the transcript in question is less than 0.9. hl.any is true if any element of its argument (which must be an array or set) is true.

jonlin · December 17, 2020, 2:26pm

Thank you for the reply! However, I still get an error:

all_transcripts = hl.flatten(mt.gnomad_lof_metrics.values())
transcripts_pLI_status = all_transcripts.map(lambda t: t.pLI < 0.9)
mt = mt.filter_rows(hl.any(transcripts_pLI_status))

Traceback (most recent call last):

File “”, line 1, in

TypeError: any() missing 1 required positional argument: ‘collection’

danking · December 17, 2020, 4:23pm

Heh. Looks like Hail’s any doesn’t follow the Python standard API. I’ll fix that today.

In the meantime change hl.any(transcript_pLI_status) to hl.any(lambda x: x, transcript_pLI_status).

jonlin · December 17, 2020, 5:12pm

Thank you so much, it worked!

Topic		Replies	Views
Filtering INFO fields Hail Query & hailctl	1	381	April 27, 2023
Querying gnomad using hail table by gene symbol Hail Query & hailctl	5	1914	August 25, 2022
Filter vars on MAF and output rsIDs and freq Hail Query & hailctl	4	698	March 5, 2019
Gnomad allele frequency query Hail Query & hailctl	11	2766	March 31, 2021
Accessing fields in structure type in Hail 0.2 Hail Query & hailctl	1	554	August 26, 2019

Gnomad_lof_metrics annotations and filtering data

Related topics