I’m hoping to use the 1000 genomes high coverage dataset for analysis; I accessed the mt from the hail-datasets-us
bucket, but I see that the autosomes matrix table only has ‘GT’ as an entry field (although it looks like the chrX MT has entries more filled out [ schema ]. )
Just wanted to check if there is a version of this dataset that already exists in matrix table form already floating around with the additional standard entry fields (AD, DP, GQ, GT, PGT, PID, PL)? I’d like to run hl.de_novo
on this mt but need that information to do so. I can recreate the MT from the vcf’s from 1000 genomes but wanted to check if I could save a step before I did.
As an aside, when trying to load these datasets with the load_dataset() function, I ran into this - not sure if it’s my issue or a broken link somewhere?
mt = hl.experimental.load_dataset(name='1000_Genomes_HighCov_autosomes',
version='NYGC_30x',reference_genome='GRCh38',region='us',cloud='gcp')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-7b79b5cad50e> in <module>
----> 1 mt = hl.experimental.load_dataset(name='1000_Genomes_HighCov_autosomes',
2 version='NYGC_30x',
3 reference_genome='GRCh38',
4 region='us',
5 cloud='gcp')
/opt/conda/miniconda3/lib/python3.8/site-packages/hail/experimental/datasets.py in load_dataset(name, version, reference_genome, region, cloud)
68 names = set([dataset for dataset in datasets])
69 if name not in names:
---> 70 raise ValueError(f'{name} is not a dataset available in the'
71 f' repository.')
72
ValueError: 1000_Genomes_HighCov_autosomes is not a dataset available in the repository.
Thanks so much!