I’ve questions about what the
pc_project function does.
It seems that it expects both its first and second argument, a matrix table
mt that is being projected and a hail table
pc_loadings_ht, to be keyed by [‘locus’, ‘alleles’].
To what extend is the
pc_project function flexible when it comes to pojecting a call from
mt that belongs to a variant
row_0 in a situation when the key of
row_0 only partly matches a key from
For example, say a have a homozygous 0/0 call in
mt, in a variant whose key is
[chr1:1000, [A]], and there is a variant in
pc_loadings_ht whose key is
[chr1:1000, [T, A]]. Is the call matched with this variant and interpreted as 1/1? What if it the variant in
mt were keyed by a)
[chr1:1000, [T, A, *], b)
[chr1:1000, [A, T], c)
[chr1:1000, [T, *] ?
How is missingness of variants from
mt interpreted? As those variants having been called as 0/0?
I hope those question at least make sense. If not, let me know
Any explanations will be appreciated!
Thank You! If yet may dwell on this topic:
I’m actually specifically interested in projecting my samples (along with samples from 1kg genomes) on
The thing is, the above loadings are given for specific biallelic variants keyed by [locus, alleles] (which makes sense of course). So, if I understand correctly, it is entirely up to me to worry about translating genomes in my matrix table into those same variants, in case my ‘alleles’ are different in a given position.
a) Would it then be accurate to say that a missing genotype (in the matrix that is being projected) is equivalent to an average genotype in that position from the population that was used in constructing the loadings?
b) What is
c) When You say: “The sum […] is divided by the square of the number of variants for each sample …”, You mean the overall number of variants in the loadings table, regardless of missingness in my samples?
As a complete sidenote: I’m a bit buffled by
(1 - mt._af) standing in the norm on line 61. Is there maybe some short handwavy explanation for it?
This is the hardy-weinberg variance term, we’re dividing by
2 * p * q there, I believe.
cc @konradjk to confirm?