Best ways to filter Mt down to GT values

Hi all, I’ve been trying to figure out the best way to get GT values from my Mt. I currently have a Mt consisting of the locus and participants that I am interested in extracting the GT value for each participant at each locus. The ultimate goal is to get all the GT values into a pandas dataframe.

In a smaller data set of 5 participants just converting the Mt to a Ht and then using .to_pandas() worked perfectly.

df = mt.make_table().to_pandas()

Now I have a Mt of 2000 participants and this method does not work due to memory errors.

I’ve tried a couple other ways but have been getting into dead ends. Any tips for approaching this?

Going through disk might make this easier, and alleviate memory a bit:

mt.GT.export('some_path.tsv')
df = pd.read_table('some_path.tsv')
1 Like