Genotype matrix in hail 0.2

Dear Hail team,

Given a genomic region, is it possible to output a genotype matrix (M individuals by N SNPs, in which the values are minor allele counts (0, 1, 2)) with hail 0.2?

Thank you very much!
Best regards,

Hi Wei,
Try the following:

mt = hl.import_vcf('src/test/resources/sample.vcf')

mt = hl.filter_intervals(mt, [hl.parse_locus_interval('20:10620000-10650000')])

mt = mt.select_entries(GT = mt.GT.n_alt_alleles())


You can change the interval(s) and files as necessary.

Hi Tim,

Thank you so much! That is exactly what I need.

Best regards,

Hi @tpoterba,

I was trying your suggestion to export a table containing ‘samples’ and ‘counts’ and I’m getting ‘RuntimeException: Class file too large!’.

Here what I did,

# aggregate by AF bins and consequence type
mt_grouped = (mt
              .group_rows_by(mt.af_bins, mt.csq_group)

# export table
tb = (mt_grouped

The grouped matrix have 10k samples, six consequence groups and 10 AF bins…so I expect a file with 600,000 rows (that’s small).

any idea? do you have any other suggestion to get a table with four columns (e.g. sample_id, af_bin, csq_group and counts (entries))?



the code you posted above will export a table with 10,000 fields – and this is where Hail is having trouble.

If you instead run

tb = (mt_grouped

I expect things should work, and give you the 4-column file you want.

1 Like

Hi Tim,

It works perfectly! :wink:


1 Like