Index_bgen zlib compression exception

I’m trying to index a bgen while on RAP.

get_bgen_path = lambda chrom: f'file:///mnt/project/Bulk/Exome sequences/Population level exome OQFE variants, BGEN format - interim 450k release/ukb23150_c{chrom}_b0_v1.bgen'
get_idx2_path = lambda chrom: f'{hail_dir}/exome/wes_450k/bgen/chr{chrom}.idx2'

chrom = 21
hl.index_bgen(
    path = get_bgen_path(chrom),
    index_file_map = {get_bgen_path(chrom): get_idx2_path(chrom)},
)

But I get HailException: Hail only supports zlib compression.

1 Like

Log: hail-20220505-0948-0.2.78-b17627756568.log (18.3 KB)

We don’t support zstd compression, which was added after we wrote the BGEN importer. What BGEN is this? We can add this support but we’ve got a lot going on right now.

How do I check the BGEN version? I can’t seem to get it with bgenix.

This isn’t a bottleneck for me, so no rush.

I have the same problem, these are all of the UK Biobank genotype files (500K WES, as well as the older 450K WES).

1 Like

Thanks for the ping, @orr. We’ve noted this as a key new feature for our users working with UKB. The team is very resource constrained these days, but we’ll do our best to get it done for y’all.

1 Like

Are there any updates or anywhere to track progress? I couldn’t find an issue on github. Having to fallback to VCFs is pretty slow and costly with UKB which uses zstd.

Hey @RossDeVito !

I empathize with the pain. This is the place to watch for updates. The more folks post or like these threads, the more it helps me argue for the value of this.

Unfortunately, it hasn’t been a priority for our funding labs. I’m not quite sure why but I guess folks aren’t working with the latest round of UKB.

Feature in PR:

1 Like

zstd support will land in 0.2.108.

1 Like