PCA job aborted from SparkException

Yep, thank you! I am running some other analyses right now but I’ll post a log tomorrow if I still have issues upon rerunning.

@tpoterba it looks like upload_log was removed from the codebase

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-39-93dcd1342068> in <module>
----> 1 hl.upload_log(bucket+"/pca_run.log")

AttributeError: module 'hail' has no attribute 'upload_log'

I found this issue that was closed in April of this year: https://github.com/hail-is/hail/issues/7392

Notably, since using 0.2.49 I was successfully able to run hl.ld_prune() (runtime was approx 4hrs), which brought my variant row count down from 580k to 340k with r2=0.2. Previously this function produced a SparkException so we are getting somewhere!

Even with another overnight run, the hl.hwe_normalized_pca() looks like it is a stuck job.

Happy to generate a log file for this if there is a way.

Is the logging from the hl.init() statement the same log? Looks like I can access that from the terminal of the Terra virtual machine and it looks quite large now.

Oops, sorry, got confused – hl.copy_log is what I meant to point you to. That’s a convenience around using another utility to copy the log from the local disk of the driver machine to a Google bucket / remote FS.

Yeah, the log on that machine is what we want – I do expect it to be somewhat large.

Should I post in here?

Probably won’t fit. Can you email as attachment/drive link to hail-team@broadinstitute.org?

1 Like

For anyone else following, wanted to post an update on this. I finally was able to run a PCA job after some help from Tim.

We found that my MatrixTable, assembled via import_plink() was not partitioned very efficiently ( you can check this via mt.n_partitions().

We were able to both repartition the data and save/checkpoint as a MatrixTable for added efficiency with

gen = gen.repartition(250).checkpoint('gs://some/[path.mt](http://path.mt)', overwrite=True)

I did have some trouble with the shuffling step of the repartition due to using preemptible nodes, so Tim provided me with the following code for a “no shuffle” repartition:

def no_shuffle_repartition(mt, path1, path2, n_parts):
     mt = mt.checkpoint(path1)
     return hl.read_matrix_table(path1, _intervals=mt._calculate_new_partitions(n_parts)).checkpoint(path2)
```