I’ve noticed that a couple jobs have been running much longer than expected and was wondering if you would be able to help me figure out why. For example, a job took 8 hours, 56 minutes to run yesterday on a dataset with 300k samples, but the previous run on ~450k samples took only 17 minutes. I no longer have the log for the job that took 17 minutes but will email the log from the job that ran in ~9 hours.
As always, I’d appreciate any insight – thanks for all of the help!
Let’s see, so you’re saying you have a job that used to be much faster, and now it’s suddenly slower on 2.77. Can you try running with hl._set_flags(no_whole_stage_codegen='1') at the beginning of your script (if you do an hl.init(), it should be after that, otherwise it should be the first line of your script after imports). If this ends up making a difference, gives us something to go off of. If it doesn’t, back to drawing board.
The flag is basically a way of saying “execute this code the old way, not the new and improved way”. The “new and improved” way started in 0.2.75, and likely still has some kinks to work out, possible that you’ve uncovered a pathological case.