Does Hail support the BCF format?


It’s not a big deal for us, but just so you know there are some people in the world (mostly from other genome centers like UMich) who do output them.

It’s rather hard to support well, since BCF is not as naively splittable as VCF (perhaps with an index?). It’s probably not worth the development effort right now to build something that can load in parallel, and even though it would be easier to build something that can load a BCF serially, there’s less value add there (since using bcftools to generate a VCF will have the same time complexity).

We’ll add support when someone needs it. As far as I know it hasn’t come up before.

Techincally bgziped files aren’t technically splittable but if you do enough downstream validation, it can be reliable. We can do the same, or require (and produce) indices. Doesn’t seem hard.

Don’t add it on our behalf. This was mostly about deciding whether we’d accept a request to split a VCF (which we do w/ Hail now) that turned out to in fact be a BCF. So we just said no. :slight_smile: