Hello,
I am new to Hail and am exporting a Hail MT to VCF using the All of Us code snippets; the code runs fine however the FORMAT columns are omitted despite the necessary entry fields being present in the MatrixTable. Documentation warns this happens with Table objects but I have confirmed that I am using a MatrixTable. I am reading in the produced VCFs by concatenating the shards and veiwing with bcftools, rather than using hl.import_vcf as I want to bcftools for my downstream processing.
an exceprt ogf my code:
vcf_header = “FILEPATH/data/vcf_header.txt”
os.system(“gsutil cat FILEPATH/data/vcf_header.txt”)
##fileformat=VCFv4.2
##reference=gs://gcp-public-data–broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
##FILTER=<ID=ExcessHet,Description=“Site has excess het value larger than the threshold”>
##FILTER=<ID=LowQual,Description=“Low quality”>
##FILTER=<ID=NO_HQ_GENOTYPES,Description=“Site has no high quality variant genotypes”>
##FORMAT=<ID=AD,Number=R,Type=Integer,Description=“Allelic depths for the ref and alt alleles in the order listed”>
##FORMAT=<ID=FT,Number=1,Type=String,Description=“Genotype Filter Field”>
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=“Genotype Quality”>
##FORMAT=<ID=GT,Number=1,Type=String,Description=“Genotype”>
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description=“Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)”>
##INFO=<ID=AC,Number=A,Type=Integer,Description=“Allele count in genotypes, for each ALT allele, in the same order as listed”>
##INFO=<ID=AF,Number=A,Type=Float,Description=“Allele Frequency, for each ALT allele, in the same order as listed”>
##INFO=<ID=AN,Number=1,Type=Integer,Description=“Total number of alleles in called genotypes”>
##INFO=<ID=homozygote_count,Number=R,Type=Integer,Description=“Number of homozygotes per allele. One element per allele, including the reference.”>
metadata = hl.get_vcf_metadata(vcf_header)
mt_vcf = mt_vcf.repartition(50, shuffle=True)
print(type(mt_vcf))
mt_vcf.describe()
<class ‘hail.matrixtable.MatrixTable’>
Global fields:
None
Column fields:
** ‘s’: str**
Row fields:
** ‘locus’: locus**
** ‘alleles’: array**
** ‘filters’: set**
** ‘info’: struct {**
** AC: array, **
** AF: array, **
** AN: int32, **
** homozygote_count: array**
** }**
Entry fields:
** ‘GQ’: int32**
** ‘GT’: call**
** ‘AD’: array**
** ‘RGQ’: int32**
** ‘FT’: str**
** ‘PS’: int64**
Column key: [‘s’]
Row key: [‘locus’, ‘alleles’]
out_vcf = f’{bucket}/data/fads_cluster.vcf.bgz’
hl.export_vcf(mt_vcf, out_vcf, parallel=“header_per_shard”, tabix = False, metadata=metadata)