Hi there!
I was wondering how does hail generate the sample call rate? Are there specific criteria for defining the missing data for the SNPs?
Thank you!
Maria
Hi there!
I was wondering how does hail generate the sample call rate? Are there specific criteria for defining the missing data for the SNPs?
Thank you!
Maria
Assuming you’re using sample_qc
, it’s the number of non-missing genotypes (where hl.is_defined(mt.GT)
is true) divided by the number of rows in the dataset (mt.count_rows()
). Missingness is defined simply as things that are missing. No special logic is used, nothing about allele frequencies, etc. If you load data from a file, the missing genotypes are exactly the ones explicitly stated as missing in the file.
Good to know! Thank you very much for your prompt response.