Sorry if this is not a Hail question per se, but I suppose you’d know the answer.
I am limited to 46 cores per VM, and at the moment I’m successfully running Hail 0.2 on such VM.
I have to analyse UK Biobank so I think it would be a good idea to have more cores available for the computations. I can create other VMs (also with 46 cores), so my questions are:
Is it possible to expand Hail computations onto cores from other VMs using spark submit?
If so, on these external VMs, would I have to install Hail etc on top of Spark?
Apart from the RAM per core, will I need to associate large local disk space for caching or something like that?
hl.init(master=‘10.4.1.208’)
Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File “”, line 1, in
File “”, line 2, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 561, in wrapper
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 256, in init
File “”, line 2, in init
File “/usr/local/hail/build/distributions/hail-python.zip/hail/typecheck/check.py”, line 561, in wrapper
File “/usr/local/hail/build/distributions/hail-python.zip/hail/context.py”, line 97, in init
File “/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py”, line 1133, in call
File “/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py”, line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: org.apache.spark.SparkException: Could not parse Master URL: ‘10.4.1.208’
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2760)
at org.apache.spark.SparkContext.(SparkContext.scala:501)
at is.hail.HailContext$.configureAndCreateSparkContext(HailContext.scala:112)
at is.hail.HailContext$.apply(HailContext.scala:237)
at is.hail.HailContext.apply(HailContext.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
When you use plain Spark, can you use all 96 cores?
It’s also possible that your memory settings are such that Spark can’t allocate all the cores to preserve the memory:core ratio specified in the configuration.
huh, this looks like a properly networked 2-node cluster.
It’s possible that the driver nodes aren’t contributing to the execution core pool. The executor size is also really big here, we’ve found things work well with ~4-8 core executors. Did you set up the cluster yourself, or is it a standing cluster set up by an IT team?
I created the VMs myself, but I am limited as to the “flavours” of the VMs (number of cores, RAM etc), which are made available by the central IT team.