Are there additional resources to better understand how the
sample_qc() function computes the resulting statistics? I have looked at https://hail.is/docs/stable/hail.VariantDataset.html#hail.VariantDataset.sample_qc but there is not a lot there.
I imported a single chromosomes vcf from the 1000 genomes project as a vds and called sample_qc. The function worked and I received some statistics, but the results for dpMean, dpStDev, gqMean, and gqStDev were None for all samples.
vds = (hc.import_vcf('gs://genomics-public-data/1000-genomes/vcf/ALL.chr16.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf')
df = vds.samples_table().to_pandas()
Thanks for your help!
sample_qc and variant_qc just take the means of the GQ/DP values in the genotypes. However, it can’t take the mean if there is no GQ/DP field! here’s the FORMAT field of that VCF:
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage from MaCH/Thunder">
This is a bad thing to do silently, though, and there’ll be a better error message in 0.2.
Thanks Tim! The lack of knowledge came from me not realizing that GQ/DP came from the source vcf file as opposed to being imputed based on the imported variants (which when I think about now doesn’t really make sense).
Appreciate your help!