Filter samples from MatrixTable


i want to filter from MatrixTable only individuals(samples) of interest
I found in previous post:

mt.filter_rows(mt.s == ‘NA00001’)

which works for single sample, but how can I filter it if I have a list of samples such as:

samples = [‘NA00001’, ‘NA00002’]


1 Like

I’m doing exactly this solution and I’m getting this error:
TypeError: array: parameter ‘collection’: expected expression of type set or array or dict<(‘any’, ‘any’)>, found

This is my code:
samples = table[‘target panel prefix’]
mt = mt.filter_cols(hl.array(samples).contains(mt.s))

What can be the reason?

samples here isn’t a python list, it’s a table field. If you do:

samples = table['target panel prefix'].collect()

This should work, I think

Thank you so much, it worked!

If I have a txt file which listed all samples ID without header. Looks like:


Now I want to filter out these samples from my MT file. What should I do?
with Hail v0.1 I do:

to_remove = hc.import_table('outliers.txt', no_header=True).key_by('f0')
vds_filtered = vds.filter_samples_table(to_remove, keep=False)

I am new to v0.2. According the above answer, I think it might should be:

to_remove = hl.import_table('outliers.txt', no_header=True).key_by('f0')
samples_to_remove = to_remove.collect()
filtered_mt = mt.filter_cols(hl.array(samples_to_remove).contains(mt.s), keep=False)

But I am not sure about it and I do not understand what ‘.contains(mt.s)’ do, any help?
Thanks a lot!

I make it work by:

sample_table = hl.import_table('outlier.txt', no_header=True).key_by('f0')
filt_mt = mt.filter_cols(hl.is_defined(sample_table[mt.col_key]), keep=False)

that solution is exactly what we would have suggested!

There’s also a set of anti_join methods which translates to “remove keys appearing in this other table” that can make this a little more terse:

sample_table = hl.import_table('outlier.txt', no_header=True).key_by('f0')
filt_mt = mt.anti_join_cols(sample_table)
1 Like

Thanks Tim and good to know this “.anti_join_cols” method!