Dataproc Workers Lost After intensive Task

For single-dataset processing, yes. But joins are extremely sensitive to partitioning mismatches – and Spark doesn’t make it especially easy to deal with this problem.

Loading both datasets with the same number of partitions would probably be a good strategy, I think.