PCA filtering samples?

Hey hail team,

I am running hl.hwe_normalized_pca, and it appears to be filtering samples. Is that expected behavior?

This is the code I am running:

logger.info("Running PCA for PC-Relate...")
pruned_qc_mt = remove_hard_filter_samples(
    data_source,
    freeze,
    hl.read_matrix_table(qc_mt_path(data_source, freeze, ld_pruned=True)),
    gt_field="GT",
 ).unfilter_entries()
 eig, scores, _ = hl.hwe_normalized_pca(
     pruned_qc_mt.GT, k=10, compute_loadings=False
 )
 scores.write(
    relatedness_pca_scores_ht_path(data_source, freeze), args.overwrite
 )

The input QC MT has 302329 samples, and the scores HT has 301762. There are only 486 samples that should be removed with remove_hard_filter_samples. Why are 81 samples missing from the scores HT?

Shouldn’t. Can you count_cols just after remove_hard_filter_samples?

Thanks Konrad! Figured it out, the samples are getting erroneously filtered pretty far upstream