How to split a huge VCF into chunks of 1000 variants

I want to generate a set of VCF chunks of 1000 variants, so I can process all these chunks separately (VT, annotations,…).

Then I want to merge these processed chunks into one final VCF

Are you trying to use a tool other than hail to process the intermediate VCFs? The normal hail way of doing things is to turn a VCF into a MatrixTable, which is implicitly broken up into chunks that are processed in parallel.

Yes I need to process the intermediate files by other tools than hail (vt decompose, normalize, vep,…)

I think something like split will be better than Hail in this case. Using Hail to parse a VCF, convert to Hail’s efficient internal format, only to then reproduce a VCF again doesn’t take advantage of the benefits of Hail.

As far as I know, we don’t have a way to produce VCF chunks of exactly 1000 variants.

Did you have troubles using split or other standard Unix tools on your dataset?