Sample_qc help on depth mean and genotype quality

Are there additional resources to better understand how the sample_qc() function computes the resulting statistics? I have looked at https://hail.is/docs/stable/hail.VariantDataset.html#hail.VariantDataset.sample_qc but there is not a lot there.

I imported a single chromosomes vcf from the 1000 genomes project as a vds and called sample_qc. The function worked and I received some statistics, but the results for dpMean, dpStDev, gqMean, and gqStDev were None for all samples.

vds = (hc.import_vcf('gs://genomics-public-data/1000-genomes/vcf/ALL.chr16.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf')
   .split_multi()
   .sample_qc()
   .variant_qc())

df = vds.samples_table().to_pandas()
df.head()

Thanks for your help!

Hi Greg,
sample_qc and variant_qc just take the means of the GQ/DP values in the genotypes. However, it can’t take the mean if there is no GQ/DP field! here’s the FORMAT field of that VCF:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage from MaCH/Thunder">
##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype Likelihoods">

This is a bad thing to do silently, though, and there’ll be a better error message in 0.2.

Thanks Tim! The lack of knowledge came from me not realizing that GQ/DP came from the source vcf file as opposed to being imputed based on the imported variants (which when I think about now doesn’t really make sense).

Appreciate your help!

Thanks,
Greg