Hi there!
I’m new to Hail, and I was hoping you could help me the export of a MatrixTable annotation.
I believe I’m doing something relatively common, i.e. counting the number of variants of a certain category in each gene of each sample.
I’d like to export these counts in a flattened way that’s readable in R, but I can’t find a way to flatten the structure to something like:
sampleName gene nMissense nSynon
sampleName gene nMissense nSynon
s1 ABC 1 0
s1 BCE 0 2
s2 ABC 1 2
s2 BCE 0 1
Alternatively, I’d be happy to export a valid JSON file which I could read with rjson or jsonlite in R.
I’ve tried the following:
- aggregate and count each variant category separately, i.e. each annotation in a sample contain a key:value structure where the key is the gene
in this case, each annotation would have the same keys (i.e. same genes), which I don’t know how to combine - aggregate and count by gene under a single hl.struct, i.e. the key is the gene, and the values are variant_category:count
in this way I have everything I need
I’m giving an example of this second, to keep my post as simple as possible.
Annotation step:
annotations = mt.annotate_cols(
geneCounts = hl.agg.group_by(
mt.info.SYMBOL,
hl.struct(
nTotalSynonymousGene = hl.agg.count_where(
mt.info.Consequence.contains('synonymous_variant')
& mt.GT.is_non_ref()
& hl.is_defined(mt.GT)),
nTotalMissenseGene = hl.agg.count_where(
mt.info.Consequence.contains('missense_variant')
& mt.GT.is_non_ref()
& hl.is_defined(mt.GT)
)
)
)
)
I have then saved the MatrixTable cols() into a separate object, in order to handle a Table I can export in a TSV file (or JSON):
annotations.cols().select('geneCounts').flatten().export('path/test_by_gene.json(or tsv)')
The Table has this structure:
s geneCounts
sample1 [{"key":["RRN3P1"],"value":{"nTotalSynonymousGene":0,"nTotalMissenseGene":0}}...]
sample2 [{"key":["RRN3P1"],"value":{"nTotalSynonymousGene":0,"nTotalMissenseGene":0}}...]
sample3 [{"key":["RRN3P1"],"value":{"nTotalSynonymousGene":0,"nTotalMissenseGene":0}}...]
But when I try to flatten, nothing happens and a JSON-like file is exported.
I haven’t succeeded exporting a TSV file.
If I read the produced JSON-like file into R, with jsonlite I get an error referring to the formatting indicating trailing garbage, while rjson indicates there’s unquoted strings.