Hi again,
Unfortunately the conversion step crashed with No space left on device
error after 7 hour of run. Here is the code and the error.
#!/usr/bin/env python
import sys
import os.path
import hail as hl
import sklearn
import pickle
VCF="/data4/temp_mergeVCFs_AMT/merged.vcf.gz"
vcfbase = os.path.basename(VCF)
mtout = "../04-results/03-hailTable/" + vcfbase + ".mt"
hl.import_vcf(VCF, reference_genome='GRCh38', force_bgz=True, array_elements_required=False).write(mtout, overwrite=True)
Initializing Hail with default parameters...
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2022-10-06 07:09:57 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.3
SparkUI available at http://ip-172-18-1-233.eu-west-1.compute.internal:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.100-2ea2615a797a
LOGGING: writing to /bioinfoRD/ARCdata/Projects_AMT/2022-09-29_arcWGSgnomadPredict/03-runme/hail-20221006-0709-0.2.100-2ea2615a797a.log
2022-10-06 07:10:07 Hail: INFO: scanning VCF for sortedness...
2022-10-06 08:04:21 Hail: INFO: VCF is out of order...=========>(907 + 2) / 909]
Write the dataset to disk before running multiple queries to avoid multiple costly data shuffles.
2022-10-06 12:25:36 Hail: INFO: Ordering unsorted dataset with network shuffle9]
Traceback (most recent call last):=> (441 + 8) / 909]
File "/bioinfoRD/ARCdata/Projects_AMT/2022-09-29_arcWGSgnomadPredict/03-runme/./01-vcf2hail.py", line 14, in <module>
hl.import_vcf(VCF, reference_genome='GRCh38', force_bgz=True, array_elements_required=False).write(mtout, overwrite=True)
File "<decorator-gen-1172>", line 2, in write
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/hail/typecheck/check.py", line 577, in wrapper
return __original_func(*args_, **kwargs_)
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/hail/matrixtable.py", line 2558, in write
Env.backend().execute(ir.MatrixWrite(self._mir, writer))
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/hail/backend/py4j_backend.py", line 104, in execute
self._handle_fatal_error_from_backend(e, ir)
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/hail/backend/backend.py", line 181, in _handle_fatal_error_from_backend
raise err
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/hail/backend/py4j_backend.py", line 98, in execute
result_tuple = self._jbackend.executeEncode(jir, stream_codec, timed)
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/py4j/java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "/bioinfoRD/ARCdata/Projects_AMT/conda_envs/hail/lib/python3.10/site-packages/hail/backend/py4j_backend.py", line 31, in deco
raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: IOException: No space left on device
I have plenty of space. Does the program uses /tmp/ directory by any chance?