Hello,
I’m working on outputting genotype (GT
) calls for each sample, but I’m hitting some issues with handling the matrix table. To avoid conflicts, I’ve ended up using .collect()
to bring the entire table into memory, which isn’t ideal due to high memory consumption.
My script now:
sample = col.s
(Filter the MatrixTable for the specific sample)
sample_mt = self.mt.filter_cols(self.mt.s == sample)
(Extract the GT column,Table handle this way to avoid structure conflict)
sample_entries_table = sample_mt.entries()
sample_entries_table = sample_entries_table.select(‘GT’)
(Convert the entries (GT field) to a simple array without keys)
gt_array = sample_mt.entries().select(‘GT’).collect()
As I previously used with export():
(This outputs extra columns (locus, alleles, and s))
sample_entries_table = sample_mt.entries().select(‘GT’)
sample_entries_table.export(output_file)
I am not aware if this is the key column issue or function handling problem, but I feel that there is a more efficient way to extract the individual GT calls without using collect(). Any guide would be appreciated.