You’re generally going to have a better time if you combine everything into one dataset and compute statistics on subsets from that dataset, rather than iterating through many smaller datasets. We need more information about these input VCFs, though. Are these project VCFs (GT, GQ, etc FORMAT fields) for a group of samples from sequencing data? Those cannot be losslessly combined (a site might appear in one VCF but not another).
If your VCFs are genotype data, it’s probably possible to combine since those have the same set of variants.
Yes, my VCF file contains fields such as GT, GP, etc FORMAT fields.
I also did a mt.describe() for one file from each ethnicity and they all look the same as shown below
Does hail have any function to combine all these files together? I see VCF combiner but I believe this is different from what I am looking for (they talk about gVCF which I think is different from my file which has vcf.gz extension)