After importing a PLINK dataset with skip_invalid_loci=True, I have no variants

Hi, I was importing .bed, .bim and .fam files in plink format using Hail and I noticed a problem with the creation of the MatrixTable. Even if I got:

2020-08-21 08:47:17 Hail: INFO: Found 1141 samples in fam file.
2020-08-21 08:47:17 Hail: INFO: Found 730059 variants in bim file. after the import with import_plink:
mt = hl.import_plink(bed='gs://d/file.bed',
                     bim='gs://d/file.bim',
                     fam='gs://d/file.fam',
                     reference_genome='GRCh38', skip_invalid_loci=True)

when I run mt.count_cols() and mt.count_rows() I got 1141 and 0 as the results.
Do anyone know how can I solve this? I’d like to get 1141 samples as the result of the rows’ count.
Many thanks

It sounds like all the loci in your dataset are invalid and, as a result, were skipped. Are you sure your dataset is aligned to GRCh38? You might try GRCh37.

You might also look at the documentation on import_plink's contig_recoding argument. By default, Hail assumes that, for example, Chromosome 1 is encoded as “1” in GRCh38 in PLINK. You can inspect the representation of your chromosomes by importing without a reference genome:

mt = hl.import_plink(..., reference_genome=None)
mt.show()
print(mt.locus.contig.collect_as_set())

Thanks! Yes, my data are aligned to GRCh38. The problem was linked to skip_invalid_loci option.
I didn’t understand the reason, but since there were in my data some SNPs with chr or position = 0, also all SNPs with the correct locus information were interpreted as invalid too. I solved it by deleting the wrong SNPs from the data before importing the data.

ah, there’s definitely a bug here – skipInvalidLoci isn’t using the recoded contig if you’re passing contig_recoding.