OK you are right. I actually don’t need the key orders to be exactly the same. I’m overthinking this.
Here’s my code for subsampling genotype and phenotype. I did use
filter_cols for the genotype filtering.
ds = hl.import_bgen(bgen_fname,
variants = variants,
samples = hl.import_table("autosomes.50K.sample",delimiter='\s+').key_by('ID_1')
# down sample to 50_000 samples
bgen = ds.filter_cols(hl.is_defined(samples[ds.s]))
pheno = hl.import_table('pheno.tsv.bgz').key_by('s')
pheno_filt = pheno.semi_join(samples)
I think this would work for my purpose, which is to do a GWAS for the down-sampled dataset.