Run VEP from AWS S3

I am trying to access files on AWS S3 specified in the main VEP config json file by specifying full paths like s3://$bucket_name/$path. I verified that the files do exist on s3 at the specified paths but I am getting the error:

hail.utils.java.FatalError: IOException: error=2, No such file or directory

Regarding the file that does exist on s3 at the path that I specify in the config file:

variant_effect_predictor.pl

I know that when I run locally when all the code accesses hadoop storage, in config I need to write paths to the local file system, but what is local file system in AWS then? Is it possible to run VEP from S3 storage?

Regards.

I don’t think it’s easy to set up VEP to use files from S3. Our installation script for hailctl dataproc localizes data as one of its first steps.

NLSVTN, it’s worth considering packing the files on a custom AMI for use with EMR. If you’re using GRCh37 or 38 the AMIs at https://github.com/goldfinchbio/emr-hail may be of use. If not you’ll find some useful bootstrapping to help with your own implementation.

1 Like