In the `testHail` gradle step, I get "ImportError: No module named pyspark.sql"

How do I fix this? The full log looks like this:

[ec2-user@ip-172-31-54-96 hail]$ PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.3-src.zip:$HAIL_HOME/python SPARK_CLASSPATH=$HAIL_HOME/build/libs/hail-all-spark.jar ./gradlew test shadowJar
:checkSettings
check: seed = 1, size = 1000, count = 10
:compileJava UP-TO-DATE
:nativeLib
(cd libsimdpp-2.0-rc2 && cmake .)
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/hail/src/main/c/libsimdpp-2.0-rc2
:compileScala UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:nativeLibTest
mkdir -p build
g++ -O3 -march=native -g -std=c++11 -Ilibsimdpp-2.0-rc2 -Wall -Werror  -DNUMBER_OF_GENOTYPES_PER_ROW=256 ibs.cpp test.cpp -o build/functional-tests
./build/functional-tests
66 test(s) succeeded.
:compileTestJava UP-TO-DATE
:compileTestScala UP-TO-DATE
:processTestResources UP-TO-DATE
:testClasses UP-TO-DATE
:shadowJar UP-TO-DATE
:testHail
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib64/python2.7/unittest/__main__.py", line 12, in <module>
    main(module=None)
  File "/usr/lib64/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/lib64/python2.7/unittest/main.py", line 149, in parseArgs
    self.createTests()
  File "/usr/lib64/python2.7/unittest/main.py", line 158, in createTests
    self.module)
  File "/usr/lib64/python2.7/unittest/loader.py", line 130, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib64/python2.7/unittest/loader.py", line 91, in loadTestsFromName
    module = __import__('.'.join(parts_copy))
  File "/tmp/hail/python/hail/__init__.py", line 1, in <module>
    from hail.context import HailContext
  File "/tmp/hail/python/hail/context.py", line 3, in <module>
    from pyspark.sql import SQLContext
ImportError: No module named pyspark.sql
:testHail FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':testHail'.
> Process 'command 'python'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 10.649 secs

This usually means that your SPARK_HOME environment variable is not set correctly. Ensure that:

  • the contents of SPARK_HOME is the top-level directory of the spark installation, for example /opt/spark-2.0.2-bin-hadoop2.7/
  • you defined SPARK_HOME using the export directive, which makes it visible to the python executable (i.e. export SPARK_HOME=/foo/bar/spark