Python in worker has different version than that in driver. hail.is_defined

Hi!
I’m on Ubuntu 18.04.1 LTS, hail version: 0.2.33-5d8cae649505. I use a jupyter-notebook with python 3.7.

After I:

  1. import a vcf to a MatrixTable mt
  2. format column_ids creating new column-field, mt = mt.annotate_cols(sample_id = mt.s.split("_")[0])
  3. import a pandas dataframe to a Table ht_pheno
  4. annotate cols of mt with ht, mt = mt.annotate_cols(pheno = ht_pheno[mt.sample_id])
    and do mt.count, there is no error.
    But after I additionally do
  5. mt = mt.filter_cols(hl.is_defined(mt.pheno)),
    mt.count results in an error:
    Exception: Python in worker has different version 2.7 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
    There is neither PYSPARK_PYTHON nor PYSPARK_DRIVER_PYTHON in the list given by os.environ in my notebook.

Steps 1-5 run without errors with the same input on a different computer (with slightly different configuration). Could You please help me diagnose the problem?
traceback.txt (27.3 KB)

This isn’t running local mode, it’s running in cluster mode, right?

The problem is that the python executable on the workers is a different version than the driver. This wouldn’t actually kill Hail if the error weren’t fatal (we don’t use pyspark features internally, just JVM ones) but pyspark can’t start with this config.

I’m not sure about the mode. I didn’t consciously choose it at any point. Is the mode somehow automatically determined during pip install hail? How can I inspect it?

The mode is determined by the Spark installation/config. If you pip install hail on a fresh virtual machine / computer, then that will pip install pyspark, and run in local mode by default. If you have a Spark installation on the computer already and there are environment variables like SPARK_HOME defined, then Spark will probably try to start in cluster mode.

I would think that it should be impossible to get this error while running in local mode, but you could try setting:

export PYSPARK_PYTHON="the python you're using to run hail"
export PYSPARK_DRIVER_PYTHON="the python you're using to run hail"