Docker error when running VEP

I’m running the following code:

import hail as hl
hl.init()
all_genomes_mt = hl.read_matrix_table( f'gs://{genomes_mt_path}' )
result = hl.vep(all_genomes_mt)

And getting the following error:

hail.utils.java.FatalError: HailException: VEP command '/vep --format vcf --json --everything --allele_number --no_stats --cache --offline --minimal --assembly GRCh38 --fasta /opt/vep/.vep/homo_sapiens/95_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --plugin LoF,loftee_path:/opt/vep/Plugins/,gerp_bigwig:/opt/vep/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/opt/vep/.vep/human_ancestor.fa.gz,conservation_file:/opt/vep/.vep/loftee.sql --dir_plugins /opt/vep/Plugins/ -o STDOUT' failed with non-zero exit status 125
  VEP Error output:
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

I used the following command to create the Dataproc cluster the above is running on:

hailctl dataproc start vep-cluster \
    --image-version=2.1.7-debian11 \
    --autoscaling-policy=autoscaling-policy  \
    --master-machine-type=n1-highmem-8 \
    --worker-machine-type=n1-highmem-8 \
    --worker-boot-disk-size=1000 \
    --secondary-worker-type=non-preemptible \
    --preemptible-worker-boot-disk-size=1000 \
    --properties=dataproc:dataproc.logging.stackdriver.enable=true,dataproc:dataproc.monitoring.stackdriver.enable=true,spark:spark.sql.shuffle.partitions=24240,spark:spark.default.parallelism=24240 \
    --vep GRCh38

I’ve kept everything very close to default since this is the first time I’ve run VEP with Hail, but the default configuration doesn’t seem to work. Do I need to configure Docker a certain way? Or change something about the way I’m creating the cluster?

Thank you,
Daniel Cotter

Hey @dlcotter !

My apologies! This should be fixed in latest Hail 0.2.123. The issue with some more information about root cause is #12936.

A post was split to a new topic: LowerUninterpretable error while running VEP