Hi all,
I am trying to work with a VEP-annotated version of the Genebass variant-level summary statistics. When I try to work with the VEP-annotated version, Hail either stalls (the progress bar stops) or I get FatalError: IOException: No space left on device
– despite not trying to write out any file to storage, as you can see from the code below.
Does this sound like insufficient memory/RAM, or some other issue? If memory, what would be optimal parameters to set up a VM with hailctl
for this case? My current configuration is the hailctl default (hailctl dataproc start cluster_name
)
Here is example code which generates the error (it runs fine if you exclude the annotate_rows
command):
#load genebass variants
genebass_variant = hl.read_matrix_table('path_to_genebass_variants’)
genebass_variant = genebass_variant.key_rows_by(genebass_variant.markerID)
#Filter variants
vep_ht = hl.read_table("path_to_genebass_vep_hailtable”)
vep_ht = vep_ht.key_by("markerID")
genebass_variant = genebass_variant.filter_rows(genebass_variant.annotation == "missense")
genebass_variant = genebass_variant.annotate_rows(vep = vep_ht[genebass_variant.markerID].vep)
genebass_variant = genebass_variant.filter_rows(genebass_variant.gene == "PCSK9")
genebass_variant.entries().show(10)
Thank you! -Dan