Fix reference allele when exporting to VCF ("plink2 --ref-from-fa")

I am using the following code to generate single VCF output files from a matrixtable:

sample_list = mt_all.s.collect()
for sample in sample_list:
    mt_sample = mt_all.filter_cols(mt_all.s == sample)
    # Remove all rows without any called non-reference alleles for the current sample
    mt_sample = mt_sample.filter_rows(hl.agg.any(mt_sample.GT.is_non_ref()))
    # Export to individual VCF files
    hl.export_vcf(mt_sample, output/{sample}_grch38.vcf.bgz')

I have two questions:

  • I notice that the reference allele allele in the output VCF files is not always the reference allele in the reference genome. This issue is addressed when using the following plink2 command:
    plink2 \
        --bfile ${name} \
        --export vcf \
        --out ${name} \
        --ref-from-fa \
        --fa ${g37_fasta}

Obviously when switching REF/ALT alleles, the GT fields are fixed accordingly. Does Hail also have a “–ref-from-fa” functionality?

  • The loop to generate single-sample VCFs provided above is quite slow. Any recommendation to speed it up?

Thank you in advance,