Hi all,
I’m relatively new to hail, but with help of some people on here, I have figured out how to load a few thousand samples into a large VDS. The final, but crucial step for our project is to now create a single VCF file with a calculated AF field based on how many times each mutation is listed in our VDS.
I feel like this is something that there is documentation for, but I can’t seem to find an example. Can you point me in the right direction?
Hi Richard,
Sorry this isn’t easier – putting together some materials but it will take a few days.
Is the VCF you want sites-only with just site / AF info, or do you want a (large) project VCF with genotype fields too?
Thanks @tpoterba. Just the AF field is needed to start.
Hi there,
I now have ~10K samples in my VDS. Have you had a chance to put together some instructions about how to go from VDS to a VCF with AF field?
I think this is an answer, but probably not the fastest possible implementation. @tpoterba might have a better approach:
mt = vds.to_dense_mt()
mt = hl.variant_qc(mt)
hl.export_vcf(mt.rows().select(AF=mt.variant_qc.AF[0]), 'gs://path/to/vcf.bgz')
If you can live with a sharded VCF, this will execute faster with export_vcf(..., parallel='header_per_shard')
.