Hey hail team,
I am running
hl.hwe_normalized_pca, and it appears to be filtering samples. Is that expected behavior?
This is the code I am running:
logger.info("Running PCA for PC-Relate...") pruned_qc_mt = remove_hard_filter_samples( data_source, freeze, hl.read_matrix_table(qc_mt_path(data_source, freeze, ld_pruned=True)), gt_field="GT", ).unfilter_entries() eig, scores, _ = hl.hwe_normalized_pca( pruned_qc_mt.GT, k=10, compute_loadings=False ) scores.write( relatedness_pca_scores_ht_path(data_source, freeze), args.overwrite )
The input QC MT has 302329 samples, and the scores HT has 301762. There are only 486 samples that should be removed with
remove_hard_filter_samples. Why are 81 samples missing from the scores HT?