Hello,
I have been running a python hail script on chromosome 20 that is doing sample qc, variant_qc and some basic filtering steps on google cloud using 64 workers. It is exporting tables and writing out matrixtables in various checkpoints.  The script completed in around 5 hours.
When I doubled the number of workers to 128, the process did not speed up but still took 5 hours. Why do you think doubling the number of workers did not improve running time with the same dataset?
What are the parameters that would help speed up the process especially as we ar enow scaling to chromosome 1 and whole genome eventually?
This is my current configuration:
âimage-version=1.4-debian9 
âproperties=spark:spark.driver.maxResultSize=0,spark:spark.task.maxFailures=20,spark:spark.kryoserializer.buffer.max=1g,spark:spark.driver.extraJavaOptions=-Xss4M,spark:spark.executor.extraJavaOptions=-Xss4M,hdfs:dfs.replication=1,dataproc:dataproc.logging.stackdriver.enable=false,dataproc:dataproc.monitoring.stackdriver.enable=false,spark:spark.driver.memory=41g 
âinitialization-actions=gs://hail-common/hailctl/dataproc/0.2.20/init_notebook.py 
âmetadata=^|||^WHEEL=gs://hail-common/hailctl/dataproc/0.2.20/hail-0.2.20-py3-none-any.whl|||PKGS=aiohttp|bokeh>1.1,<1.3|decorator<5|gcsfs==0.2.1|hurry.filesize==0.9|ipykernel<5|nest_asyncio|numpy<2|pandas>0.22,<0.24|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests>=2.21.0,<2.21.1|scipy>1.2,<1.4|tabulate==0.8.3|PyYAML 
âmaster-machine-type=n1-highmem-8 
âmaster-boot-disk-size=100GB 
ânum-master-local-ssds=0 
ânum-preemptible-workers=0 
ânum-worker-local-ssds=0 
ânum-workers=64 
âpreemptible-worker-boot-disk-size=40GB 
âworker-boot-disk-size=40 
âworker-machine-type=n1-standard-8 
âzone=europe-west2-a 
âinitialization-action-timeout=20m 
âlabels=creator=pa10_sanger_ac_uk```