Jobs running much longer than expected

Hi hail team!

I’ve noticed that a couple jobs have been running much longer than expected and was wondering if you would be able to help me figure out why. For example, a job took 8 hours, 56 minutes to run yesterday on a dataset with 300k samples, but the previous run on ~450k samples took only 17 minutes. I no longer have the log for the job that took 17 minutes but will email the log from the job that ran in ~9 hours.

As always, I’d appreciate any insight – thanks for all of the help!

Hey Katherine,

Let’s see, so you’re saying you have a job that used to be much faster, and now it’s suddenly slower on 2.77. Can you try running with hl._set_flags(no_whole_stage_codegen='1') at the beginning of your script (if you do an hl.init(), it should be after that, otherwise it should be the first line of your script after imports). If this ends up making a difference, gives us something to go off of. If it doesn’t, back to drawing board.

1 Like

What’s the Python script? I have the log file.

thanks for the suggestion – I’ll try setting that flag for the next step!

here’s the code associated with the log: https://github.com/broadinstitute/ukbb_qc/blob/freeze_6/ukbb_qc/release/prepare_vcf_data_release.py#L660 (calls this: https://github.com/broadinstitute/ukbb_qc/blob/freeze_6/ukbb_qc/assessment/sanity_checks.py#L483)

That project is private, can’t see it.

The flag is basically a way of saying “execute this code the old way, not the new and improved way”. The “new and improved” way started in 0.2.75, and likely still has some kinks to work out, possible that you’ve uncovered a pathological case.

If you want to have someone add me to that project I can look at it.

thank you! I don’t have admin access to that repo but have requested access for you. I’ll post again when I have an update

@johnc1231 you should have access to the repo now – let me know if not!