ABFS (abfs://) paths with ADLS Gen2 no longer work in Hail 0.2.137 due to filesystem validation – is it expected / supported?

Hi,

we recently upgraded our pipeline from an older Hail and ran into an issue when working with Azure Data Lake Storage Gen2 using ABFS (abfs://) paths.

Our setup:

  • Hail: 0.2.137
  • Spark: 3.5.7
  • Scala: 2.12
  • Java: 11
  • Python: 3.10.19
  • Storage: ADLS Gen2
  • Jars:
    • hail-all-spark.jar taken from the installed Hail
    • hadoop-azure-3.3.4.jar
    • azure-storage-7.0.1.jar
    • postgresql-42.6.0.jar
  • Paths like: abfs://<container>@<account>.dfs.core.windows.net/…
  • Hadoop ABFS configured correctly (fs.azure.account.key.<account>.dfs.core.windows.net)

This setup worked fine with older versions.

The problem:

When writing or reading Hail datasets (VDS / MatrixTable), we now get:

ValueError: no file system found for url abfs://…

This comes from hailtop.aiotools.router_fs during validate_file(), before Spark/Hadoop actually performs the write.

We understand that:

  • Spark + Hadoop do support ABFS
  • hailtop.fs officially supports local, GCS, S3, and ABS, but not ABFS

In older Hail versions, this validation either didn’t exist or wasn’t enforced for dataset writes, so ABFS “just worked” via Spark. In 0.2.137, the validation appears to be unconditional, which effectively blocks ABFS-backed Hail datasets.

What we tried:

  • Switching to https://.blob.core.windows.net/… - that fails because Hail datasets are directories, and HTTPS blob access treats paths as files.
  • Switching to abs:// - not viable for Spark/Hail dataset writes.
  • Downgrading Hail - not ideal

Current workaround:

We can make things work by bypassing the Python-side validation after hl.init():

backend = hl.current_backend()
_orig = backend.validate_file

def _validate_file(uri, *a, **k):
    if str(uri).startswith(("abfs://", "abfss://")):
        return
    return _orig(uri, *a, **k)

backend.validate_file = _validate_file

After this, Spark/Hadoop handles ABFS correctly and VDS/MT writes succeed.
However, this feels like a hack, and we’re not very confident about its long-term safety or compatibility with future Hail releases.

So I just wanted to ask:

  • Is ABFS intentionally unsupported at the Hail Python FS layer?
  • Is bypassing validate_file() an acceptable workaround, or could it cause subtle issues or is there a recommended or supported way to use ADLS Gen2 with Hail going forward?
  • Are there any plans to add ABFS support to hailtop.fs, or allow skipping filesystem validation for Hadoop-backed schemes?

I also noticed the deprecation of hl.hadoop_* functions in favor of hailtop.fs, which we also use and where I expect same issues.

Thanks a lot for your time, and for maintaining Hail!

All the best