Hi.
I appreciate the hail package, because it is super fast.
I would like to take a VCF from gnomad, and filter out some of the INFO fields - gnomad has 900+, and I only care about a few of them.
Example code
import hail as hl
import re
chr22mt = hl.read_matrix_table('data/gnomad.genomes.v3.1.2.sites.chr22.mt')
chr22mt.info.select(*[x for x in chr22mt.info.keys() if re.match('AF_\S{2,3}$', x)]).describe()
That gets me a much smaller INFO set, which what I want.
However, trying to take the filtered info into chr22mt fails.
chr22mt.info = chr22mt.info.select(*[x for x in chr22mt.info.keys() if re.match('AF_\S{2,3}$', x)]).describe()
--------------------------------------------------------
Type:
struct {
AF_oth: array<float64>,
AF_ami: array<float64>,
AF_sas: array<float64>,
AF_XX: array<float64>,
AF_fin: array<float64>,
AF_XY: array<float64>,
AF_eas: array<float64>,
AF_amr: array<float64>,
AF_afr: array<float64>,
AF_raw: array<float64>,
AF_mid: array<float64>,
AF_asj: array<float64>,
AF_nfe: array<float64>
}
--------------------------------------------------------
Source:
<hail.matrixtable.MatrixTable object at 0x7f6980d417c0>
Index:
['row']
--------------------------------------------------------
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pipeline-user/.local/lib/python3.9/site-packages/hail/table.py", line 117, in __setattr__
raise NotImplementedError(f"'{self.__class__.__name__}' object is not mutable")
NotImplementedError: 'MatrixTable' object is not mutable\
I’m clearly missing something, but it doesn’t seem like any of select_rows(), select_cols(), select_entries() would work. How do I filter out fields of INFO from a VCF?
Thank you,
Uri David