Dear Hail team,
Given a genomic region, is it possible to output a genotype matrix (M individuals by N SNPs, in which the values are minor allele counts (0, 1, 2)) with hail 0.2?
Thank you very much!
Best regards,
Wei
Dear Hail team,
Given a genomic region, is it possible to output a genotype matrix (M individuals by N SNPs, in which the values are minor allele counts (0, 1, 2)) with hail 0.2?
Thank you very much!
Best regards,
Wei
Hi Wei,
Try the following:
mt = hl.import_vcf('src/test/resources/sample.vcf')
mt = hl.filter_intervals(mt, [hl.parse_locus_interval('20:10620000-10650000')])
mt = mt.select_entries(GT = mt.GT.n_alt_alleles())
mt.make_table(separator='').export('/tmp/foo.txt')
You can change the interval(s) and files as necessary.
Hi Tim,
Thank you so much! That is exactly what I need.
Best regards,
Wei
Hi @tpoterba,
I was trying your suggestion to export a table containing ‘samples’ and ‘counts’ and I’m getting ‘RuntimeException: Class file too large!’.
Here what I did,
# aggregate by AF bins and consequence type
mt_grouped = (mt
.group_rows_by(mt.af_bins, mt.csq_group)
.aggregate_entries(n_hets=agg.count_where(mt.GT.is_het()))
.result()
)
# export table
tb = (mt_grouped
.make_table(separator='')
.export('samples_counts.txt')
)
The grouped matrix have 10k samples, six consequence groups and 10 AF bins…so I expect a file with 600,000 rows (that’s small).
any idea? do you have any other suggestion to get a table with four columns (e.g. sample_id, af_bin, csq_group and counts (entries))?
Best,
E.
the code you posted above will export a table with 10,000 fields – and this is where Hail is having trouble.
If you instead run
tb = (mt_grouped
.entries()
.export('samples_counts.txt'))
I expect things should work, and give you the 4-column file you want.
Hi Tim,
It works perfectly!
Thanks