How to analyze Hail job logs using Spark UI after terminating a cluster

jinasong · June 5, 2020, 7:47am

Hello,

I am running the Hail GVCF combiner using Jupyter notebook in Dataproc.

When the GVCF combiner is running, I can monitor the job progress in Spark UI.
I am wondering if I can see the job log in the same way after terminating the cluster which Hail was run.

Thank you.

tpoterba · June 5, 2020, 10:49am

For jobs like this, I’d recommend using hailctl dataproc submit (which wraps gcloud dataproc submit) since jupyter notebooks have problems with reconnecting to see execution status.

jinasong · June 5, 2020, 6:02pm

Thank you for your reply. I will try to use ‘submit’ instead of ‘notebook’. If I use ‘submit’, can I see the job log using Spark UI, even after terminating the cluster? What I want is to analyze the executors’ info after the function run is done and the cluster is stopped. I would like to know if it is possible or not.

tpoterba · June 5, 2020, 6:08pm

No, that’s not possible, since the Spark UI is a web server run by the spark driver machine. Once that machine is no longer running, the UI is dead.

jinasong · June 7, 2020, 10:23pm

Got it. Thanks. Then, how I can check the job results and task information after the machine was gone?

danking · June 8, 2020, 1:34pm

You cannot. At least the driver machine must stay alive. What do you hope to learn from the Spark task information?

jinasong · June 8, 2020, 4:31pm

Thank you for your answer. I am trying to do joint-calling for 1000 GVCFs using Hail GVCF combiner. If it can be run successfully in our conditions, we may extend the number of GVCFs. In this work, my first step is to find out the parameters in Hail function, Spark, and Dataproc for optimizing the runtime and cost of Hail GVCF combiner function for a GVCF small set. I would like to keep track of all job logs that depend on parameter changes. When I analyze the results, I want to stop the cluster for saving the cost, because I only need to see the completed job results. I hope it could be the answer of your question. If you can give me some advice related to my work, it would be welcome. Thank you.

danking · June 8, 2020, 4:35pm

You can use hailctl dataproc modify --num-preemptible-workers 0 to shrink the cluster. The minimal cluster is 2 non-preemptible (regular) workers and one leader. That should cost very little money per hour and give you plenty of time to analyze the logs.

I doubt you’ll find much useful information in the worker logs.

Maybe @tpoterba or @chrisvittal can provide some information on recommended worker configurations.

tpoterba · June 8, 2020, 5:04pm

I think the right model here is to use autoscaling, so you can inspect this stuff while paying only for the driver machine (less than $1.00 / hr).

There are some instructions on autoscaling to be found here: https://hail.is/docs/0.2/experimental/vcf_combiner.html

and here:
broad.io/hail-tips-and-tricks-1

jinasong · June 8, 2020, 6:14pm

Thanks to both @danking and @tpoterba. I will definitely look into your reference. I am sure it would be very helpful for my work.

Topic		Replies	Views
Trouble with vcf_combiner on gcloud dataproc cluster Hail Query & hailctl	2	326	October 22, 2021
Speed/streaming problems with Hail 0.2 Hail Query & hailctl	6	778	March 20, 2019
Container killed on request. Exit code is 137 Hail Query & hailctl	8	605	October 26, 2021
Running Hail with a remote Spark Help [0.1]	10	2770	June 2, 2017
“sparkContext was shut down” while running hail/pyspark on a large dataset Help [0.1]	3	2741	August 7, 2020

How to analyze Hail job logs using Spark UI after terminating a cluster

Related topics