Hey hail team,
I am running hl.hwe_normalized_pca
, and it appears to be filtering samples. Is that expected behavior?
This is the code I am running:
logger.info("Running PCA for PC-Relate...")
pruned_qc_mt = remove_hard_filter_samples(
data_source,
freeze,
hl.read_matrix_table(qc_mt_path(data_source, freeze, ld_pruned=True)),
gt_field="GT",
).unfilter_entries()
eig, scores, _ = hl.hwe_normalized_pca(
pruned_qc_mt.GT, k=10, compute_loadings=False
)
scores.write(
relatedness_pca_scores_ht_path(data_source, freeze), args.overwrite
)
The input QC MT has 302329 samples, and the scores HT has 301762. There are only 486 samples that should be removed with remove_hard_filter_samples
. Why are 81 samples missing from the scores HT?