Issues grouping by cols and then filtering by GT

If you prefer real-time chat, you can find us at https://hail.zulipchat.com.

If you want a Table containing de novo calls as defined by Kaitlin Samocha, then something like this example:

pedigree = hl.Pedigree.read('data/trios.fam')
priors = hl.import_table('data/gnomadFreq.tsv', impute=True)
priors = priors.transmute(**hl.parse_variant(priors.Variant)).key_by('locus', 'alleles')
de_novo_results = hl.de_novo(dataset, pedigree, pop_frequency_prior=priors[dataset.row_key].AF)

should indeed produce a table which has three keys: locus, alleles, proband_id. hl.de_novo works directly from a Matrix Table with one sample per column. I suppose its a reasonable ask that hl.de_novo also accepts a trio matrix. I’ll send this feedback to the team.

hl.trio_matrix, on the other hand, facilitates the design of your own trio-based methods by presenting the data in a one trio per column structure.

The stack trace from the second error is missing the actual cause. You might try telling Jupyter or ipython to use Minimal or Plain tracebacks:

%xmode Minimal

You can also use https://gist.github.com to post the full stack trace and link it here.

EDIT: One thought, it looks like you might be using Hail on your laptop or a server rather than a cluster. Unfortunately, the JVM, by default, uses a very small amount of memory. You need to explicitly request that the JVM uses all the memory on your machine