Error reading filename with brackets (DNAnexus)

DNAnexus of course.

Running hl.import_vcf on filenames containing brackets e.g. file:///mnt/project/Bulk/DRAGEN WGS/DRAGEN population level WGS variants, pVCF format [500k release]/chr1/ukb24310_c1_b92_v1.vcf.gz (why anyone would use this folder name in the first place) throws HailException: arguments refer to no files.

It might just be my regex skills but neither ["[500k release]", "[[]500k release[]]", "\[500k release\]", "\\[500k release\\]"] work. I have not tried to reproduce it outside their environment but I suspect the issue might be general.

from pyspark.sql import SparkSession

import hail as hl

builder = (
    SparkSession
    .builder
    .enableHiveSupport()
)

spark = builder.getOrCreate()
hl.init(sc=spark.sparkContext)

WGS_PATH = f'file:///mnt/project/Bulk/DRAGEN WGS/DRAGEN population level WGS variants, pVCF format [500k release]/chr1/ukb24310_c1_b92_v1.vcf.gz'

mt = hl.import_vcf(
        WGS_PATH,
        force_bgz = True,
        reference_genome="GRCh38",
        array_elements_required=False,
)

> 2024-01-18 10:21:07.456 Hail: WARN: 'file:///mnt/project/Bulk/DRAGEN WGS/DRAGEN population level WGS variants, pVCF format [500k release]/chr1/ukb24310_c1_b92_v1.vcf.gz' refers to no files

Here is the response from support:

Hello Jakob,

We have noticed that HAIL has problem when file paths have special characters like “”. This character is only introduced in the 500k release.

The workaround now would be to change the corresponding folder name, in the project, to remove these two characters, and try running the code again.

To move/rename a folder, please go to a local terminal and use dx mv command:

dx mv -h usage: dx mv [-h] [–env-help] [-a] source [source …] destination Move or rename data objects and/or folders inside a single project.

Please do note that this moving command must be carried out before a jupyterlab session is opened.

Has anyone encountered something similar? Or even better found a fix that does not involve duplicating a very large amount of data?

Best,
Jakob

EDIT: hail==0.2.116

hail-20240118-1018-0.2.116-cd64e0876c94.log (41.4 KB)