Np.save not working on google cloud notebook

lilliank · November 5, 2020, 6:45pm

Hello,

I am trying to save a small np array to a google cloud bucket. I am working on the google cloud in a jupyter notebook. This is the command I am using:

np.save('gs://ukb-gt/eigenvals_downsampled.npy', eigenvals)

I don’t get an error, but no file appears. The same thing happens when I try to save a plot with plt.savefig()

Thanks!
Lillian

tpoterba · November 5, 2020, 6:47pm

Python can’t talk to Google storage using standard file system calls. You can use a Hail utility to do both of these:

with hl.hadoop_open('gs://ukb-gt/eigenvals_downsampled.npy', 'w') as f:
    np.save(f, eigenvals)

same for plot I think.

lilliank · November 5, 2020, 7:04pm

Thanks! I think this is getting closer, but I am now getting this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-b1250ed90190> in <module>
      1 with hl.hadoop_open('gs://ukb-gt/eigenvals_downsampled.npy') as f:
----> 2     np.save(f, eigenvals)

<__array_function__ internals> in save(*args, **kwargs)

/opt/conda/miniconda3/lib/python3.6/site-packages/numpy/lib/npyio.py in save(file, arr, allow_pickle, fix_imports)
    551         arr = np.asanyarray(arr)
    552         format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 553                            pickle_kwargs=pickle_kwargs)
    554     finally:
    555         if own_fid:

/opt/conda/miniconda3/lib/python3.6/site-packages/numpy/lib/format.py in write_array(fp, array, version, allow_pickle, pickle_kwargs)
    658     """
    659     _check_version(version)
--> 660     _write_array_header(fp, header_data_from_array_1_0(array), version)
    661 
    662     if array.itemsize == 0:

/opt/conda/miniconda3/lib/python3.6/site-packages/numpy/lib/format.py in _write_array_header(fp, d, version)
    432     else:
    433         header = _wrap_header(header, version)
--> 434     fp.write(header)
    435 
    436 def write_array_header_1_0(fp, d):

TypeError: write() argument must be str, not bytes

tpoterba · November 5, 2020, 7:07pm

numpy might not support arbitrary file-like objects correctly, actually. Another option is to save to the local directory on the driver node and copy:

np.save('/tmp/some_file', ...)
hl.hadoop_copy('file:///tmp/some_file', 'gs://foo/bar')

lilliank · November 5, 2020, 7:07pm

I think it is having trouble because np.save saves it as a binary file. I used this code instead and it worked!

with hl.hadoop_open('gs://ukb-gt/eigenvals_downsampled.txt', 'w') as f:
    #np.save(f, eigenvals)
    np.savetxt(f, eigenvals)

danking · November 5, 2020, 7:28pm

You can use 'wb' instead of 'w' to open a file in binary-writing mode.

lilliank · November 5, 2020, 7:33pm

Awesome, both worked, thanks so much!

Topic		Replies	Views
Save figure on gcloud hail dataproc Hail Query & hailctl	2	459	November 16, 2020
Save plot on google cloud Hail Query & hailctl	5	1208	June 4, 2020
Reading VDS from google bucket fires error Help [0.1]	9	3921	May 13, 2017
Using Hail with Jupyter Notebooks on Google Cloud Help [0.1]	0	4112	April 29, 2017
Using Hail on the Google Cloud Platform Help [0.1]	18	14032	September 14, 2017

Np.save not working on google cloud notebook

Related topics