Hello,
I am trying to save a small np array to a google cloud bucket. I am working on the google cloud in a jupyter notebook. This is the command I am using:
np.save('gs://ukb-gt/eigenvals_downsampled.npy', eigenvals)
I don’t get an error, but no file appears. The same thing happens when I try to save a plot with plt.savefig()
Thanks!
Lillian
Python can’t talk to Google storage using standard file system calls. You can use a Hail utility to do both of these:
with hl.hadoop_open('gs://ukb-gt/eigenvals_downsampled.npy', 'w') as f:
np.save(f, eigenvals)
same for plot I think.
Thanks! I think this is getting closer, but I am now getting this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-b1250ed90190> in <module>
1 with hl.hadoop_open('gs://ukb-gt/eigenvals_downsampled.npy') as f:
----> 2 np.save(f, eigenvals)
<__array_function__ internals> in save(*args, **kwargs)
/opt/conda/miniconda3/lib/python3.6/site-packages/numpy/lib/npyio.py in save(file, arr, allow_pickle, fix_imports)
551 arr = np.asanyarray(arr)
552 format.write_array(fid, arr, allow_pickle=allow_pickle,
--> 553 pickle_kwargs=pickle_kwargs)
554 finally:
555 if own_fid:
/opt/conda/miniconda3/lib/python3.6/site-packages/numpy/lib/format.py in write_array(fp, array, version, allow_pickle, pickle_kwargs)
658 """
659 _check_version(version)
--> 660 _write_array_header(fp, header_data_from_array_1_0(array), version)
661
662 if array.itemsize == 0:
/opt/conda/miniconda3/lib/python3.6/site-packages/numpy/lib/format.py in _write_array_header(fp, d, version)
432 else:
433 header = _wrap_header(header, version)
--> 434 fp.write(header)
435
436 def write_array_header_1_0(fp, d):
TypeError: write() argument must be str, not bytes
numpy might not support arbitrary file-like objects correctly, actually. Another option is to save to the local directory on the driver node and copy:
np.save('/tmp/some_file', ...)
hl.hadoop_copy('file:///tmp/some_file', 'gs://foo/bar')
I think it is having trouble because np.save saves it as a binary file. I used this code instead and it worked!
with hl.hadoop_open('gs://ukb-gt/eigenvals_downsampled.txt', 'w') as f:
#np.save(f, eigenvals)
np.savetxt(f, eigenvals)
You can use 'wb'
instead of 'w'
to open a file in binary-writing mode.
1 Like
Awesome, both worked, thanks so much!