Java Error Message when writing output in hail

I am trying to run a vcf filtering script with hail via Terra’s virtual environment. The configuration is attached.

The python script that calls hail had no problem running but at the last step of writing to the output vcf location, hail throws a java error. Any suggestions would be greatly appreciated. The stdout log below:

hail_1202_stdout.txt (6.4 KB)

Can you post the script?

Here is the script I’m trying to run:

How big is the VCF? This looks like it’s probably a memory error – you’re running on a pretty tiny VM. You could try increasing the size of the machine provisioned to 4 or 8 cores (which will increase the memory as well).

Okay I will give that a shot, the VCF is only 5000 lines, I was just testing out the script. Let me give that a try!

@tpoterba Memory increase fixed the issue if I write it locally, but the file does not exist

SparkUI available at http://saturn-282f6108-da43-4df8-95ea-903fb699127b-m.c.terra-2d3a4d00.internal:4040
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.62-84fa81b9ea3d
LOGGING: writing to /home/jupyter/hail-20211203-1940-0.2.62-84fa81b9ea3d.log
2021-12-03 19:41:03 Hail: INFO: Reading table without type imputation
  Loading field 'f0' as type str (user-supplied)
  Loading field 'f1' as type int32 (user-supplied)
  Loading field 'f2' as type int32 (user-supplied)
2021-12-03 19:41:03 Hail: INFO: Reading table without type imputation
  Loading field 'SampleID' as type str (not specified)
  Loading field 'FamID' as type str (not specified)
  Loading field 'Role' as type str (not specified)
2021-12-03 19:41:23 Hail: INFO: Coerced sorted dataset
2021-12-03 19:41:26 Hail: WARN: entries(): Resulting entries table is sorted by '(row_key, col_key)'.
    To preserve row-major matrix table order, first unkey columns with 'key_cols_by()'
2021-12-03 19:41:26 Hail: INFO: Reading table without type imputation
  Loading field 'FamID' as type str (not specified)
  Loading field 'TrioID' as type str (not specified)
  Loading field 'SampleID' as type str (not specified)
  Loading field 'FatherID' as type str (not specified)
  Loading field 'MotherID' as type str (not specified)
2021-12-03 19:41:27 Hail: WARN: export_vcf: ignored the following fields:
    'pheno' (column)
    'num_alleles' (row)
    'a_index' (row)
    'was_split' (row)
2021-12-03 19:41:43 Hail: INFO: Coerced sorted dataset
2021-12-03 19:41:55 Hail: INFO: Coerced sorted dataset
2021-12-03 19:42:32 Hail: INFO: Coerced sorted dataset
2021-12-03 19:42:41 Hail: INFO: Coerced sorted dataset
2021-12-03 19:42:51 Hail: INFO: Coerced sorted dataset
2021-12-03 19:43:15 Hail: INFO: Coerced sorted dataset
2021-12-03 19:43:28 Hail: INFO: Coerced sorted dataset
2021-12-03 19:43:37 Hail: INFO: Coerced sorted dataset
2021-12-03 19:44:00 Hail: INFO: Coerced sorted dataset
2021-12-03 19:44:27 Hail: INFO: Ordering unsorted dataset with network shuffle
2021-12-03 19:44:27 Hail: INFO: Ordering unsorted dataset with network shuffle
2021-12-03 19:44:44 Hail: INFO: Ordering unsorted dataset with network shuffle
2021-12-03 19:45:55 Hail: INFO: merging 1 files totalling 45.8K...
2021-12-03 19:45:55 Hail: INFO: while writing:
    mssng_db6.chr12_filter_ME_v3_loosefilters.vcf.bgz
  merge time: 131.775ms
Processing:12

But doesn’t exist:

jupyter@saturn-282f6108-da43-4df8-95ea-903fb699127b-m:~$ ls
hail-20211203-1859-0.2.62-84fa81b9ea3d.log  notebooks
hail-20211203-1912-0.2.62-84fa81b9ea3d.log  stdout.txt
hail-20211203-1931-0.2.62-84fa81b9ea3d.log  wgs_pre_processing_vcf_v3_Loose_Filters.py
hail-20211203-1940-0.2.62-84fa81b9ea3d.log

I’m not sure what the default working directory is for Terra. You might try writing to an explicit full path rather than relative.

When I tried with explicit paths either locally /home/jupyter/ or uri gs://google/bucket, I get thrown the same java error as above… Not sure what’s going on here.