I started the hail cluster with the following command:
hailctl dataproc start art-cluster \
--master-machine-type n1-standard-2 \
--num-preemptible-workers 0 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--region us-east1 \
--packages seaborn,matplotlib,imblearn,scikit-learn,plotly,graspy
Then I connected to notebook:
hailctl dataproc connect art-cluster --zone=us-east1-b notebook
After the next command:
pc_rel = hl.pc_relate(mt.GT,
min_individual_maf = 0.05,
k=10,
statistics='kin',
min_kinship=0.25)
I got the error:
FatalError: SparkException:
Bad data in pyspark.daemon's standard output. Invalid port number:
1229870149 (0x494e5445)
Python command to execute the daemon was:
/opt/conda/default/bin/python -m pyspark.daemon
Check that you don't have any unexpected modules or libraries in
your PYTHONPATH:...
Is it possible that the error is due to my predifined libraries?