Jobs running much longer than expected

ch-kr · September 30, 2021, 1:18pm

Hi hail team!

I’ve noticed that a couple jobs have been running much longer than expected and was wondering if you would be able to help me figure out why. For example, a job took 8 hours, 56 minutes to run yesterday on a dataset with 300k samples, but the previous run on ~450k samples took only 17 minutes. I no longer have the log for the job that took 17 minutes but will email the log from the job that ran in ~9 hours.

As always, I’d appreciate any insight – thanks for all of the help!

johnc1231 · October 4, 2021, 2:49pm

Hey Katherine,

Let’s see, so you’re saying you have a job that used to be much faster, and now it’s suddenly slower on 2.77. Can you try running with hl._set_flags(no_whole_stage_codegen='1') at the beginning of your script (if you do an hl.init(), it should be after that, otherwise it should be the first line of your script after imports). If this ends up making a difference, gives us something to go off of. If it doesn’t, back to drawing board.

tpoterba · October 4, 2021, 2:51pm

What’s the Python script? I have the log file.

ch-kr · October 4, 2021, 3:06pm

thanks for the suggestion – I’ll try setting that flag for the next step!

here’s the code associated with the log: https://github.com/broadinstitute/ukbb_qc/blob/freeze_6/ukbb_qc/release/prepare_vcf_data_release.py#L660 (calls this: https://github.com/broadinstitute/ukbb_qc/blob/freeze_6/ukbb_qc/assessment/sanity_checks.py#L483)

johnc1231 · October 4, 2021, 3:10pm

That project is private, can’t see it.

The flag is basically a way of saying “execute this code the old way, not the new and improved way”. The “new and improved” way started in 0.2.75, and likely still has some kinks to work out, possible that you’ve uncovered a pathological case.

johnc1231 · October 4, 2021, 3:26pm

If you want to have someone add me to that project I can look at it.

ch-kr · October 4, 2021, 3:33pm

thank you! I don’t have admin access to that repo but have requested access for you. I’ll post again when I have an update

ch-kr · October 4, 2021, 4:05pm

@johnc1231 you should have access to the repo now – let me know if not!

Topic		Replies	Views
Hail 0.2 help for warning message Hail Query & hailctl	22	933	March 21, 2019
GWAS hanging up on runJob stage Hail Query & hailctl	3	442	June 25, 2020
Hail SampleQC Script Stuck Hail Query & hailctl	0	242	October 17, 2023
Export VCF taking a long time, even when running in parallel Hail Query & hailctl	3	450	December 5, 2023
Hail batch job submission problem Hail Query & hailctl	2	330	January 5, 2023

Jobs running much longer than expected

Related topics