I am using AnVIL’s Terra platform to run a JupyterNotebook and utilize Hail in that notebook to conduct some genetic analyses. The problem I am running into is that I cannot find the MatrixTable file directory in the notebook’s persistent disk when I run a “!ls -a” command when inside the notebook. The code I am using is below:
Import and initialize Hail
import hail as hl
hl.init()
from hail.plot import show
from pprint import pprint
hl.plot.output_notebook()
Print contents of current directory
!ls -a
Add path to a .gz zipped VCF file
direct_path_to_gz = “” #left out path for privacy reasons
hl.import_vcf(direct_path_to_gz, reference_genome=‘GRCh37’, n_partitions = 512, force_bgz = True).write(“name_of_matrix_table”, overwrite=True) #left out name for privacy reasons
#Re-print contents of current directory
!ls -a
When I run the second “!ls -a” command, I see the Hail log file but no MatrixTable directory. The weird thing is that I am able to run the read matrix table function, providing the same name that was given to the write function, and that works fine. How is Hail finding a directory that does not exist? I need the directory so I can “scp” that directory from the notebook’s persistent disk storage into the workspace bucket storage.
Thanks for your help!