How to deploy the latest Hail 0.2 on AWS EMR?

I have already tried two tutorials below:

I can deploy older version hail 0.2 via quickstart-hail, however when tried to build with latest version it failed.

Then I have tried hail-on-AWS-spot-instances, but it failed to build the latest hail too.

I have submitted tickets to both repo, but did not get response, maybe they no longer maintain it.

Does anyone has an up to date tutorial to setup the latest Hail 0.2 in AWS EMR? Thanks

I don’t know of any such resources right now. What’s the error you’re seeing with the hms-dbmi build?

Here I copied errors from their auto generated cloudcreation_log.out:

cd build/deploy; python3 setup.py -q sdist bdist_wheel
Traceback (most recent call last):
File “/usr/lib64/python3.6/distutils/core.py”, line 148, in setup
dist.run_commands()
File “/usr/lib64/python3.6/distutils/dist.py”, line 955, in run_commands
self.run_command(cmd)
File “/usr/lib64/python3.6/distutils/dist.py”, line 974, in run_command
cmd_obj.run()
File “/usr/lib/python3.6/dist-packages/setuptools/command/bdist_egg.py”, line 152, in run
self.run_command(“egg_info”)
File “/usr/lib64/python3.6/distutils/cmd.py”, line 313, in run_command
self.distribution.run_command(command)
File “/usr/lib64/python3.6/distutils/dist.py”, line 974, in run_command
cmd_obj.run()
File “/usr/lib/python3.6/dist-packages/setuptools/command/egg_info.py”, line 280, in run
self.find_sources()
File “/usr/lib/python3.6/dist-packages/setuptools/command/egg_info.py”, line 295, in find_sources
mm.run()
File “/usr/lib/python3.6/dist-packages/setuptools/command/egg_info.py”, line 526, in run
self.add_defaults()
File “/usr/lib/python3.6/dist-packages/setuptools/command/egg_info.py”, line 562, in add_defaults
sdist.add_defaults(self)
File “/usr/lib/python3.6/dist-packages/setuptools/command/py36compat.py”, line 34, in add_defaults
self._add_defaults_python()
File “/usr/lib/python3.6/dist-packages/setuptools/command/sdist.py”, line 134, in _add_defaults_python
self.filelist.extend(build_py.get_source_files())
File “/usr/lib64/python3.6/distutils/command/build_py.py”, line 301, in get_source_files
return [module[-1] for module in self.find_all_modules()]
File “/usr/lib64/python3.6/distutils/command/build_py.py”, line 296, in find_all_modules
m = self.find_package_modules(package, package_dir)
File “/usr/lib64/python3.6/distutils/command/build_py.py”, line 218, in find_package_modules
self.check_package(package, package_dir)
File “/usr/lib/python3.6/dist-packages/setuptools/command/build_py.py”, line 163, in check_package
init_py = orig.build_py.check_package(self, package, package_dir)
File “/usr/lib64/python3.6/distutils/command/build_py.py”, line 191, in check_package
“package directory ‘%s’ does not exist” % package_dir)
distutils.errors.DistutilsFileError: package directory ‘find_namespace:’ does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 158, in save_modules
yield saved
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 199, in setup_context
yield
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 254, in run_setup
_execfile(setup_script, ns)
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 49, in _execfile
exec(code, globals, locals)
File “/tmp/easy_install-u9a2azbv/pytest-runner-5.3.0/setup.py”, line 21, in
description=“Scalable library for exploring and analyzing genomic data.”,
File “/usr/lib64/python3.6/distutils/core.py”, line 163, in setup
raise SystemExit("error: " + str(msg))
SystemExit: error: package directory ‘find_namespace:’ does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/lib/python3.6/dist-packages/setuptools/command/easy_install.py”, line 1123, in run_setup
run_setup(setup_script, args)
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 257, in run_setup
raise
File “/usr/lib64/python3.6/contextlib.py”, line 99, in exit
self.gen.throw(type, value, traceback)
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 199, in setup_context
yield
File “/usr/lib64/python3.6/contextlib.py”, line 99, in exit
self.gen.throw(type, value, traceback)
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 170, in save_modules
saved_exc.resume()
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 145, in resume
six.reraise(type, exc, self._tb)
File “/usr/lib/python3.6/dist-packages/pkg_resources/_vendor/six.py”, line 685, in reraise
raise value.with_traceback(tb)
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 158, in save_modules
yield saved
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 199, in setup_context
yield
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 254, in run_setup
_execfile(setup_script, ns)
File “/usr/lib/python3.6/dist-packages/setuptools/sandbox.py”, line 49, in _execfile
exec(code, globals, locals)
File “/tmp/easy_install-u9a2azbv/pytest-runner-5.3.0/setup.py”, line 21, in
description=“Scalable library for exploring and analyzing genomic data.”,
File “/usr/lib64/python3.6/distutils/core.py”, line 163, in setup
raise SystemExit("error: " + str(msg))
SystemExit: error: package directory ‘find_namespace:’ does not exist

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “setup.py”, line 49, in
tests_require=[“pytest”]
File “/usr/lib64/python3.6/distutils/core.py”, line 108, in setup
_setup_distribution = dist = klass(attrs)
File “/usr/lib/python3.6/dist-packages/setuptools/dist.py”, line 325, in init
self.fetch_build_eggs(attrs[‘setup_requires’])
File “/usr/lib/python3.6/dist-packages/setuptools/dist.py”, line 446, in fetch_build_eggs
replace_conflicting=True,
File “/usr/lib/python3.6/dist-packages/pkg_resources/init.py”, line 855, in resolve
dist = best[req.key] = env.best_match(req, ws, installer)
File “/usr/lib/python3.6/dist-packages/pkg_resources/init.py”, line 1127, in best_match
return self.obtain(req, installer)
File “/usr/lib/python3.6/dist-packages/pkg_resources/init.py”, line 1139, in obtain
return installer(requirement)
File “/usr/lib/python3.6/dist-packages/setuptools/dist.py”, line 518, in fetch_build_egg
return cmd.easy_install(req)
File “/usr/lib/python3.6/dist-packages/setuptools/command/easy_install.py”, line 691, in easy_install
return self.install_item(spec, dist.location, tmpdir, deps)
File “/usr/lib/python3.6/dist-packages/setuptools/command/easy_install.py”, line 717, in install_item
dists = self.install_eggs(spec, download, tmpdir)
File “/usr/lib/python3.6/dist-packages/setuptools/command/easy_install.py”, line 898, in install_eggs
return self.build_and_install(setup_script, setup_base)
File “/usr/lib/python3.6/dist-packages/setuptools/command/easy_install.py”, line 1137, in build_and_install
self.run_setup(setup_script, setup_base, args)
File “/usr/lib/python3.6/dist-packages/setuptools/command/easy_install.py”, line 1125, in run_setup
raise DistutilsError(“Setup script exited with %s” % (v.args[0],))
distutils.errors.DistutilsError: Setup script exited with error: package directory ‘find_namespace:’ does not exist
make: *** [build/deploy/dist/hail-0.2.53-py3-none-any.whl] Error 1
ls: cannot access /opt/hail-on-AWS-spot-instances/src/hail/hail/build/deploy/dist: No such file or directory
ERROR: Invalid requirement: ‘/opt/hail-on-AWS-spot-instances/src/hail/hail/build/deploy/dist/’
Hint: It looks like a path. File ‘/opt/hail-on-AWS-spot-instances/src/hail/hail/build/deploy/dist/’ does not exist.

See this repo. With EMR 6.x, it’s easier than ever to provision a Jupyter environment that can run hail through Amazon EMR and SageMaker notebook. Automate AMI builds, including installing Hail and VEP.