Ubuntu 18.04.1 LTS,
hail version: 0.2.33-5d8cae649505. I use a jupyter-notebook with python 3.7.
- import a vcf to a MatrixTable
- format column_ids creating new column-field,
mt = mt.annotate_cols(sample_id = mt.s.split("_"))
- import a pandas dataframe to a Table
- annotate cols of mt with ht,
mt = mt.annotate_cols(pheno = ht_pheno[mt.sample_id])
mt.count, there is no error.
But after I additionally do
mt = mt.filter_cols(hl.is_defined(mt.pheno)),
mt.count results in an error:
Exception: Python in worker has different version 2.7 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
There is neither
PYSPARK_DRIVER_PYTHON in the list given by
os.environ in my notebook.
Steps 1-5 run without errors with the same input on a different computer (with slightly different configuration). Could You please help me diagnose the problem?
traceback.txt (27.3 KB)
This isn’t running local mode, it’s running in cluster mode, right?
The problem is that the
python executable on the workers is a different version than the driver. This wouldn’t actually kill Hail if the error weren’t fatal (we don’t use pyspark features internally, just JVM ones) but pyspark can’t start with this config.
I’m not sure about the mode. I didn’t consciously choose it at any point. Is the mode somehow automatically determined during
pip install hail? How can I inspect it?
The mode is determined by the Spark installation/config. If you
pip install hail on a fresh virtual machine / computer, then that will pip install pyspark, and run in local mode by default. If you have a Spark installation on the computer already and there are environment variables like
SPARK_HOME defined, then Spark will probably try to start in cluster mode.
I would think that it should be impossible to get this error while running in local mode, but you could try setting:
export PYSPARK_PYTHON="the python you're using to run hail"
export PYSPARK_DRIVER_PYTHON="the python you're using to run hail"