I’m running the following code:
import hail as hl
hl.init()
all_genomes_mt = hl.read_matrix_table( f'gs://{genomes_mt_path}' )
result = hl.vep(all_genomes_mt)
And getting the following error:
hail.utils.java.FatalError: HailException: VEP command '/vep --format vcf --json --everything --allele_number --no_stats --cache --offline --minimal --assembly GRCh38 --fasta /opt/vep/.vep/homo_sapiens/95_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --plugin LoF,loftee_path:/opt/vep/Plugins/,gerp_bigwig:/opt/vep/.vep/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/opt/vep/.vep/human_ancestor.fa.gz,conservation_file:/opt/vep/.vep/loftee.sql --dir_plugins /opt/vep/Plugins/ -o STDOUT' failed with non-zero exit status 125
VEP Error output:
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I used the following command to create the Dataproc cluster the above is running on:
hailctl dataproc start vep-cluster \
--image-version=2.1.7-debian11 \
--autoscaling-policy=autoscaling-policy \
--master-machine-type=n1-highmem-8 \
--worker-machine-type=n1-highmem-8 \
--worker-boot-disk-size=1000 \
--secondary-worker-type=non-preemptible \
--preemptible-worker-boot-disk-size=1000 \
--properties=dataproc:dataproc.logging.stackdriver.enable=true,dataproc:dataproc.monitoring.stackdriver.enable=true,spark:spark.sql.shuffle.partitions=24240,spark:spark.default.parallelism=24240 \
--vep GRCh38
I’ve kept everything very close to default since this is the first time I’ve run VEP with Hail, but the default configuration doesn’t seem to work. Do I need to configure Docker a certain way? Or change something about the way I’m creating the cluster?
Thank you,
Daniel Cotter