1000 Genomes NYGC 30x dataset - additional entry fields?

I’m hoping to use the 1000 genomes high coverage dataset for analysis; I accessed the mt from the hail-datasets-us bucket, but I see that the autosomes matrix table only has ‘GT’ as an entry field (although it looks like the chrX MT has entries more filled out [ schema ]. )

Just wanted to check if there is a version of this dataset that already exists in matrix table form already floating around with the additional standard entry fields (AD, DP, GQ, GT, PGT, PID, PL)? I’d like to run hl.de_novo on this mt but need that information to do so. I can recreate the MT from the vcf’s from 1000 genomes but wanted to check if I could save a step before I did.

As an aside, when trying to load these datasets with the load_dataset() function, I ran into this - not sure if it’s my issue or a broken link somewhere?

mt = hl.experimental.load_dataset(name='1000_Genomes_HighCov_autosomes',
version='NYGC_30x',reference_genome='GRCh38',region='us',cloud='gcp')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-27-7b79b5cad50e> in <module>
----> 1 mt = hl.experimental.load_dataset(name='1000_Genomes_HighCov_autosomes',
      2                                    version='NYGC_30x',
      3                                  reference_genome='GRCh38',
      4                                    region='us',
      5                                   cloud='gcp')

/opt/conda/miniconda3/lib/python3.8/site-packages/hail/experimental/datasets.py in load_dataset(name, version, reference_genome, region, cloud)
     68     names = set([dataset for dataset in datasets])
     69     if name not in names:
---> 70         raise ValueError(f'{name} is not a dataset available in the'
     71                          f' repository.')
     72 

ValueError: 1000_Genomes_HighCov_autosomes is not a dataset available in the repository.

Thanks so much!

Ah, hmm. We seem to have imported the phased VCFs for the autosomes. I’ll see about getting the unphased dataset with all the other sequencing quality fields loaded as well.

What version of Hail are you using? That command works for me in version 0.2.68. We distribute the list of datasets in the Hail PyPI package, so new datasets are not available to old installations.

Ah OK, I’m on 0.2.66, that’s on me. Ok, just thought I’d check - thanks for the quick response and for putting together all of this Hail-ready data!