How to extract a subset of SNPs of specific list of samples? and export as plink format


#1

HI all,
I would like to extract the genotype information like,
the region chr1:1-10000 of a list of samples [NA12878,NA12757,NA12323 …]
and export to plink or vcf format

could anyone help?


#2

0.2 code:

region_to_keep = [hl.parse_locus_interval('chr1:1-10000', 'GRCh38')]
samples_to_keep = set(['NA12878', 'NA12891'])
mt = hl.filter_intervals(mt, region_to_keep)
mt = mt.filter_cols(hl.literal(samples_to_keep).contains(mt.s))
hl.export_vcf(mt, '...')