I am trying to learn how to use hail devel by studying the 1000 genomes data, but when I import a vcf the “Entry fields” are empty. I read the documentation for the import_vcf function and can’t figure out what I am doing wrong.
I am running hail on GCP dataproc and I used the very helpful cloudtools functions to start and connect to my cluster.
import hail as hl
hl.init() # version devel-8650fd3cdd20
mt = hl.import_vcf(‘gs://genomics-public-data/1000-genomes-phase-3/vcf-20150220/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf’)
mt.describe()
---------------------------------------- Global fields: None ---------------------------------------- Column fields: 's': str ---------------------------------------- Row fields: 'locus': locus<GRCh37> 'alleles': array<str> 'rsid': str 'qual': float64 'filters': set<str> 'info': struct { CIEND: array<int32>, CIPOS: array<int32>, CS: str, END: int32, IMPRECISE: bool, MC: array<str>, MEINFO: array<str>, MEND: int32, MLEN: int32, MSTART: int32, SVLEN: array<int32>, SVTYPE: str, TSD: str, AC: array<int32>, AF: array<float64>, NS: int32, AN: int32, EAS_AF: array<float64>, EUR_AF: array<float64>, AFR_AF: array<float64>, AMR_AF: array<float64>, SAS_AF: array<float64>, DP: int32, AA: str, VT: array<str>, EX_TARGET: bool, MULTI_ALLELIC: bool } ---------------------------------------- Entry fields: None ---------------------------------------- Column key: ['s'] Row key: ['locus', 'alleles'] Partition key: ['locus'] ----------------------------------------