Google megamem machine seems to be running out of storage after 8GB

Hello, I am trying to run a few very large computations on a megamem machine, but am running into some problems.

I tried creating a megamem and copying over data, but the copy stops after about 8GB and then it says that there is no space left on the machine

The commands I am using are:

gcloud compute instances create lkp-megamem --machine-type m1-ultramem-160
gcloud compute ssh lkp-megamem
gsutil -m cp -r gs://ukb-gt/grm_575904_x_35946_meta0.mt .

It gets through some of the files, and then it just stops. This is what it prints:

Copying gs://ukb-gt/grm_575904_x_35946_meta0.mt/_SUCCESS...
Copying gs://ukb-gt/grm_575904_x_35946_meta0.mt/metadata.json...
Copying gs://ukb-gt/grm_575904_x_35946_meta0.mt/parts/part-00-2-0-0-46c273f4-5f83-9bec-bee5-77549b3c4025...
Copying gs://ukb-gt/grm_575904_x_35946_meta0.mt/parts/part-11-2-11-0-a7406ba1-2ac7-7199-b19c-c2df30ab6d9d...
Copying gs://ukb-gt/grm_575904_x_35946_meta0.mt/parts/part-06-2-6-0-349f5461-4a0d-a69c-daf0-0851503eb7f6...
.
.
.
Copying gs://ukb-gt/grm_575904_x_35946_meta0.mt/parts/part-80-2-80-0-87592dd2-99f2-d050-5b8b-21a700500631...
- [26/85 files][ 7.9 GiB/ 9.7 GiB] 81% Done 0.0 B/s

Here it stops copying. Instead it just waits for about 10 minutes before printing:

Caught ResumableDownloadException (Transfer failed after 23 retries. Final exception: b'[Errno 28] No space left on device') for download of ./grm_575904_x_35946_meta0.mt/parts/part-69-2-69-0-1a5c7cc9-fe4f-0adc-3656-0eb9b2166b6f.
[Errno 28] No space left on device
Caught ResumableDownloadException (Transfer failed after 23 retries. Final exception: b'[Errno 28] No space left on device') for download of ./grm_575904_x_35946_meta0.mt/parts/part-43-2-43-0-fe7bd347-555a-f703-ff71-a9cc1c347e1e.
ResumableDownloadException: Transfer failed after 23 retries. Final exception: b'[Errno 28] No space left on device'
...

If I do a du I get:

8249584 ./grm_575904_x_35946_meta0.mt/parts
8249596 ./grm_575904_x_35946_meta0.mt
8 ./.ssh
152 ./.gsutil/tracker-files
164 ./.gsutil
4 ./.gnupg/private-keys-v1.d
8 ./.gnupg
4 ./.config/gcloud/configurations
16 ./.config/gcloud
20 ./.config
8249816 .

Then I try to do something like mkdir dir and I get:

mkdir: cannot create directory 'old': No space left on device

Obviously the megamem can’t be out of memory: it is supposed to have 3844 gigabytes of memory.

I also did a ls in the grm_575904_x_35946_ meta0.mt/parts directory I am trying to copy and got files that look like:

part-00-2-0-0-46c273f4-5f83-9bec-bee5-77549b3c4025_.gstmp
part-01-2-1-0-9b59515f-e78e-a012-9f04-915fe3201228_.gstmp
part-02-2-2-0-b41a37e2-4b8c-4521-9c17-fe05e85b6375_.gstmp
part-03-2-3-0-58069acd-66d7-5f28-970d-74f3796e2ba4_.gstmp
part-04-2-4-0-2fcfb260-7440-ed14-679b-41a79d0164ae_.gstmp
part-05-2-5-0-d434495d-d451-d81a-3ff7-37556599e104_.gstmp
...

Do you know what could be causing this error? I updated gcloud, did many web searches, and looked at the documentation for all of these commands but I couldn’t find anything matching this problem

The machine is running out of disk space, not memory.

To create a machine with more disk space, use the –boot-disk-size option or create additional disks. Disks can also be attached to an existing machine.

1 Like