nonchev
September 10, 2020, 8:53pm
1
Hello,
i want to filter from MatrixTable only individuals(samples) of interest
I found in previous post:
mt.filter_rows(mt.s == ‘NA00001’)
which works for single sample, but how can I filter it if I have a list of samples such as:
samples = [‘NA00001’, ‘NA00002’]
mt.filter_cols(hl.array(samples).contains(mt.s))
1 Like
Hi,
I’m doing exactly this solution and I’m getting this error:
TypeError: array: parameter ‘collection’: expected expression of type set or array or dict<(‘any’, ‘any’)>, found
This is my code:
samples = table[‘target panel prefix’]
mt = mt.filter_cols(hl.array(samples).contains(mt.s))
What can be the reason?
Thanks!
Shiri
samples
here isn’t a python list, it’s a table field. If you do:
samples = table['target panel prefix'].collect()
This should work, I think
tpoterba:
table field
Thank you so much, it worked!
shuang
October 21, 2021, 1:59pm
6
Hi,
If I have a txt file which listed all samples ID without header. Looks like:
11111111
2222222
3333333
AAAAAA
Now I want to filter out these samples from my MT file. What should I do?
with Hail v0.1 I do:
to_remove = hc.import_table('outliers.txt', no_header=True).key_by('f0')
vds_filtered = vds.filter_samples_table(to_remove, keep=False)
I am new to v0.2. According the above answer, I think it might should be:
to_remove = hl.import_table('outliers.txt', no_header=True).key_by('f0')
samples_to_remove = to_remove.collect()
filtered_mt = mt.filter_cols(hl.array(samples_to_remove).contains(mt.s), keep=False)
But I am not sure about it and I do not understand what ‘.contains(mt.s)’ do, any help?
Thanks a lot!
shuang
October 22, 2021, 3:19pm
7
I make it work by:
sample_table = hl.import_table('outlier.txt', no_header=True).key_by('f0')
filt_mt = mt.filter_cols(hl.is_defined(sample_table[mt.col_key]), keep=False)
that solution is exactly what we would have suggested!
There’s also a set of anti_join
methods which translates to “remove keys appearing in this other table” that can make this a little more terse:
sample_table = hl.import_table('outlier.txt', no_header=True).key_by('f0')
filt_mt = mt.anti_join_cols(sample_table)
1 Like
shuang
October 22, 2021, 3:31pm
9
Thanks Tim and good to know this “.anti_join_cols” method!