Hail backend unresolved address

Hi there,

I’ve just started using hail in the cloud. I connect to a notebook and start an interactive hail session. When I hl.init() and then hl.stop() and then try to hl.init() again. I get the following error:

Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.
: java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:101)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:215)
	at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
	at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
	at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:985)
	at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
	at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:344)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

I’m using hail version 0.2.37-7952b436bd70.

If I stop the cluster and start it again I can re-init but this is really time-consuming.

When I try to look at the Env object:

from hail.utils.java import Env
print(Env.hc())

I get the same error:

Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.
: java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:101)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:215)
	at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1346)
	at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
	at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
	at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:985)
	at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
	at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:344)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

Okay I don’t know what’s going on. On the third try it I could run the same code and didn’t get the same error. So I don’t know whats going on

Can you share the log file? There’s likely more detail there about an underlying issue. Stopping and starting Hail contexts is rather unorthodox. May I ask why you need to do this?

I was messing around with different spark configurations, but still got the error when I wasn’t.

Now time for an embarrassing question. where are the log files written to on cloud? I assumed it would be the same bucket as my notebook, but I don’t see any there.

The path the logs are written to is printed when you hl.init. It’s not in a bucket, it’s on the master machine.

The cluster on which this was happening was stopped, so I don’t think I can recover the logs. If it comes up again I’ll reopen this topic. Sorry about this.

No worries! I hope it doesn’t happen again, but if it does and you can grab the log files, I’m sure we can diagnose further.