I am wondering how Hail treats partially missing genotypes (e.g. 0/.). I am loading in VCF files that have been constructed such that they are haploid for much of the genomes, say 50% or more partially or fully missing varying by individual, and am hoping to run linear regression. For context, plink forces you to treat such calls as either homozygous or reference, and I would like for them to just remain partially missing so as to result in the correct N. Clarity and/or advice appreciated!
Hail will treat a
0/. as a fully missing genotype in VCF import.
Hail does support haploid calls, which should be represented as
OK great to know. So if I swap out all the partially missing for the non-missing call (0/. -> 0 or 1/. -> 1) in the VCF to load in, then things will be happy? Ideally it would be great if phase were retained, but I understand this isn’t implemented yet, so this should work great for unphased efforts.
you can encode phase separately in the FORMAT field, using another integer for example.
Encoding phase in a haploid call seems like an ill-defined construct, and Hail can’t represent that.