I am importing vcf files which contains standard chromosome names but also some ambiguous.
Is there a way to preserve all lines which have invalid_loci and are not supported/default for the genome
For example if the vcf file have chromosome name such as chr1_r3r3r, how can I still save somehow these lines? For example to save the lines with invalid_loci in second matrixtable?
We can’t put them in a second matrix table. The fact that they were invalid means we can’t make objects for them. If you do skip_invalid_loci=False, you should get an error message about which one was invalid.
However, you could do the following to pull out the invalid lines:
mt = hl.import_vcf(input_vcf, contig_recoding=recode, force_bgz=True,
reference_genome=None) # will import locus as struct, not dtype locus
mt = mt.filter_rows(~hl.is_valid_locus(mt.locus.contig, mt.locus.position, reference_genome='your rg here'))
yes, chr1_gl000192_random is not standard and I cannot define reference_genome, but I want to preserve exactly chr1_gl000192_random in the vcf output file so I am able to see this mistakes.
Yeah, that’s what I thought. You didn’t specify reference_genome=None when you imported the VCF. I can tell because the type of locus is locus, instead of struct.