sph17
December 2, 2021, 8:26pm
1
I am trying to run a vcf filtering script with hail via Terra’s virtual environment. The configuration is attached.
The python script that calls hail had no problem running but at the last step of writing to the output vcf location, hail throws a java error. Any suggestions would be greatly appreciated. The stdout log below:
hail_1202_stdout.txt (6.4 KB)
sph17
December 3, 2021, 3:15pm
3
Here is the script I’m trying to run:
Shared with Dropbox
How big is the VCF? This looks like it’s probably a memory error – you’re running on a pretty tiny VM. You could try increasing the size of the machine provisioned to 4 or 8 cores (which will increase the memory as well).
sph17
December 3, 2021, 6:31pm
5
Okay I will give that a shot, the VCF is only 5000 lines, I was just testing out the script. Let me give that a try!
sph17
December 3, 2021, 7:07pm
6
@tpoterba Memory increase fixed the issue if I write it locally, but the file does not exist
SparkUI available at http://saturn-282f6108-da43-4df8-95ea-903fb699127b-m.c.terra-2d3a4d00.internal:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.62-84fa81b9ea3d
LOGGING: writing to /home/jupyter/hail-20211203-1940-0.2.62-84fa81b9ea3d.log
2021-12-03 19:41:03 Hail: INFO: Reading table without type imputation
Loading field 'f0' as type str (user-supplied)
Loading field 'f1' as type int32 (user-supplied)
Loading field 'f2' as type int32 (user-supplied)
2021-12-03 19:41:03 Hail: INFO: Reading table without type imputation
Loading field 'SampleID' as type str (not specified)
Loading field 'FamID' as type str (not specified)
Loading field 'Role' as type str (not specified)
2021-12-03 19:41:23 Hail: INFO: Coerced sorted dataset
2021-12-03 19:41:26 Hail: WARN: entries(): Resulting entries table is sorted by '(row_key, col_key)'.
To preserve row-major matrix table order, first unkey columns with 'key_cols_by()'
2021-12-03 19:41:26 Hail: INFO: Reading table without type imputation
Loading field 'FamID' as type str (not specified)
Loading field 'TrioID' as type str (not specified)
Loading field 'SampleID' as type str (not specified)
Loading field 'FatherID' as type str (not specified)
Loading field 'MotherID' as type str (not specified)
2021-12-03 19:41:27 Hail: WARN: export_vcf: ignored the following fields:
'pheno' (column)
'num_alleles' (row)
'a_index' (row)
'was_split' (row)
2021-12-03 19:41:43 Hail: INFO: Coerced sorted dataset
2021-12-03 19:41:55 Hail: INFO: Coerced sorted dataset
2021-12-03 19:42:32 Hail: INFO: Coerced sorted dataset
2021-12-03 19:42:41 Hail: INFO: Coerced sorted dataset
2021-12-03 19:42:51 Hail: INFO: Coerced sorted dataset
2021-12-03 19:43:15 Hail: INFO: Coerced sorted dataset
2021-12-03 19:43:28 Hail: INFO: Coerced sorted dataset
2021-12-03 19:43:37 Hail: INFO: Coerced sorted dataset
2021-12-03 19:44:00 Hail: INFO: Coerced sorted dataset
2021-12-03 19:44:27 Hail: INFO: Ordering unsorted dataset with network shuffle
2021-12-03 19:44:27 Hail: INFO: Ordering unsorted dataset with network shuffle
2021-12-03 19:44:44 Hail: INFO: Ordering unsorted dataset with network shuffle
2021-12-03 19:45:55 Hail: INFO: merging 1 files totalling 45.8K...
2021-12-03 19:45:55 Hail: INFO: while writing:
mssng_db6.chr12_filter_ME_v3_loosefilters.vcf.bgz
merge time: 131.775ms
Processing:12
But doesn’t exist:
jupyter@saturn-282f6108-da43-4df8-95ea-903fb699127b-m:~$ ls
hail-20211203-1859-0.2.62-84fa81b9ea3d.log notebooks
hail-20211203-1912-0.2.62-84fa81b9ea3d.log stdout.txt
hail-20211203-1931-0.2.62-84fa81b9ea3d.log wgs_pre_processing_vcf_v3_Loose_Filters.py
hail-20211203-1940-0.2.62-84fa81b9ea3d.log
I’m not sure what the default working directory is for Terra. You might try writing to an explicit full path rather than relative.
sph17
December 3, 2021, 8:31pm
8
When I tried with explicit paths either locally /home/jupyter/ or uri gs://google/bucket, I get thrown the same java error as above… Not sure what’s going on here.