Hailctl FatalError: SparkException

I started the hail cluster with the following command:

hailctl dataproc start art-cluster \
    --master-machine-type n1-standard-2 \
    --num-preemptible-workers 0 \
    --num-workers 2 \
    --worker-machine-type n1-standard-1 \
    --region us-east1 \
    --packages seaborn,matplotlib,imblearn,scikit-learn,plotly,graspy

Then I connected to notebook:

hailctl dataproc connect art-cluster --zone=us-east1-b notebook

After the next command:

pc_rel = hl.pc_relate(mt.GT,
              min_individual_maf = 0.05,
              k=10,
              statistics='kin',
              min_kinship=0.25)

I got the error:

FatalError: SparkException: 
Bad data in pyspark.daemon's standard output. Invalid port number:
  1229870149 (0x494e5445)
Python command to execute the daemon was:
  /opt/conda/default/bin/python -m pyspark.daemon
Check that you don't have any unexpected modules or libraries in
your PYTHONPATH:...

Is it possible that the error is due to my predifined libraries?

I’ve never seen something like this before. It’s totally possible that one of those packages is causing trouble for pyspark.

I might try again without those libs, see if it works, and then add back until you figure out which one is problematic. You might need to install with a pinned version that doesn’t conflict with pyspark.