I want to generate a set of VCF chunks of 1000 variants, so I can process all these chunks separately (VT, annotations,…).
Then I want to merge these processed chunks into one final VCF
I want to generate a set of VCF chunks of 1000 variants, so I can process all these chunks separately (VT, annotations,…).
Then I want to merge these processed chunks into one final VCF
Are you trying to use a tool other than hail to process the intermediate VCFs? The normal hail way of doing things is to turn a VCF into a MatrixTable, which is implicitly broken up into chunks that are processed in parallel.
Yes I need to process the intermediate files by other tools than hail (vt decompose, normalize, vep,…)
I think something like split
will be better than Hail in this case. Using Hail to parse a VCF, convert to Hail’s efficient internal format, only to then reproduce a VCF again doesn’t take advantage of the benefits of Hail.
As far as I know, we don’t have a way to produce VCF chunks of exactly 1000 variants.
Did you have troubles using split
or other standard Unix tools on your dataset?