Hi – I’m working in the AoU Google cloud environment. I’m trying to subset about 500 samples from the WGS Hail VDS and I’m not sure if I’m completely off the mark here or not. I’ve successfully worked with the WGS Hail MT for the exome regions, but this is my first time using the WGS VDS.
First I ran this command to filter for my samples:
vds = hl.vds.filter_samples(vds, samples, keep = False, remove_dead_alleles = True)
Then I ran this command to get an updated variant/sample count:
vds.variant_data.count()
However, when running the second command, my progress bar continues to freeze and random points. Today, I’ve been at “(15721 + 192) / 84648” for about the last 30 mins.
My compute cluster is as follows: 3 workers, 32 CPUs, 28.8GB ram + 3 preemptible workers
This cluster runs well, making good progress, until it doesn’t. Does anyone know if I am running into a resource issue? I am wondering if I should up the RAM. I am a bit lost since I haven’t worked with this large of a file previously.
Also, when I stop the command, the progress bar is updated, so I assume it is continuing to run. Like just now, when I stopped the command, the progress updated to “(22075 + 191) / 84648”
Thanks!