Individual GT call output handling issue

Hello,

I’m working on outputting genotype (GT) calls for each sample, but I’m hitting some issues with handling the matrix table. To avoid conflicts, I’ve ended up using .collect() to bring the entire table into memory, which isn’t ideal due to high memory consumption.

My script now:

sample = col.s
(Filter the MatrixTable for the specific sample)
sample_mt = self.mt.filter_cols(self.mt.s == sample)
(Extract the GT column,Table handle this way to avoid structure conflict)
sample_entries_table = sample_mt.entries()
sample_entries_table = sample_entries_table.select(‘GT’)
(Convert the entries (GT field) to a simple array without keys)
gt_array = sample_mt.entries().select(‘GT’).collect()

As I previously used with export():

(This outputs extra columns (locus, alleles, and s))
sample_entries_table = sample_mt.entries().select(‘GT’)
sample_entries_table.export(output_file)

I am not aware if this is the key column issue or function handling problem, but I feel that there is a more efficient way to extract the individual GT calls without using collect(). Any guide would be appreciated.

Hi @cchunju8286,

Would you mind sharing what you want to do with the result? Maybe we can give you a better answer with more details.

You can omit the extra fields and globals in Table.export by dropping the key:

>>> mt = sample_mt.entries().key_by().select_globals().select('GT')
>>> mt.describe()
----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'GT': call 
----------------------------------------
Key: []
----------------------------------------

Hope this helps,