Hi! I’m a new to Hail and I’m running it on the cluster as a singularity image. The problem I’m encountering is when I’m trying to import a vcf file.
I run the following commands using ipython
import hail as hl
hl.init(default_reference=‘GRCh38’)
hl.import_vcf(‘chr22.recal.vcf.gz’, force=True)
When I run this command, I get the following message:
hail.matrixtable.MatrixTable at 0x7f28428a9208
and I’m not sure where hail saved the Matrix table.
As a next option, I also run the following command:
hl.import_vcf(‘chr22.recal.vcf.gz’, force=True).write(‘ch22.mt’, overwrite=False)
What I get is this:
2020-08-18 20:04:14 Hail: WARN: file ‘file:chr22.recal.vcf.gz’ is 122.9G
It will be loaded serially (on one core) due to usage of the ‘force’ argument.
If it is actually block-gzipped, either rename to .bgz or use the ‘force_bgz’
argument.
[Stage 0:> (0 + 1) / 1]
And the file doesn’t get saved.
Do you have any insights as to how to get around this problem?
So what’s happening here with your first question is that the file isn’t getting saved at all, because you haven’t yet requested we save it. hl.import_vcf returns a MatrixTable Python object, which has a write method on it. So you could do something like:
For the second question, this depends on how your file is zipped. If it’s zipped with bgzip, then you should pass force_bgz=True instead of force=True. If it isn’t zipped with bgzip, then I don’t know of any solution. It would just have to be read in serially, which would be slow, but then once you write out to a MatrixTable you could use that in the future (by reading in the MatrixTable version with mt = hl.read_matrix_table("chr22.mt"))