Hi Dan,
Cheers for getting back to me so quickly and apologies for the late reply. I understand that exporting again to a VCF from a hail table is not ideal and not really what the intended use of hail is. I just want my annotated vcf file in a format that is easy to filter and I’m struggling to do this using hail due to the format of the annotation columns. Unfortunately because of the UKB MTA, I can’t simply download the WES VCF files to my local server and run VEP directly (which would be ideal). To answer your questions:
1. My main issues are just that some important annotations don’t work using the hail annotation database and filtering using the outputted hail table is quite difficult given the multiple expressions in the header. So for example:
locus
alleles
rsid
qual
filters
AF
AQ
AC
AN
vep
part_idx
block_idx
raw_score
PHRED_score
clinvar_gene_summary
clinvar_variant_summary
dbNSFP_genes
dbNSFP_variants
dbSNP_rsid
ReverseComplementedAlleles
SwappedAlleles
original_locus
freq
age_hist_het
age_hist_hom
popmax
faf
lcr
decoy
segdup
nonpar
variant_type
allele_type
n_alt_alleles
was_mixed
has_star
qd
pab_max
info_MQRankSum
info_SOR
info_InbreedingCoeff
info_ReadPosRankSum
info_FS
info_QD
info_MQ
info_DP
transmitted_singleton
fail_hard_filters
info_POSITIVE_TRAIN_SITE
info_NEGATIVE_TRAIN_SITE
omni
mills
n_nonref
tp
rf_train
rf_label
rf_probability
singleton
was_split
score
rank
singleton_rank
biallelic_rank
adj_biallelic_singleton_rank
adj_rank
adj_biallelic_rank
adj_singleton_rank
biallelic_singleton_rank
filters
bin_edges
bin_freq
n_smaller
n_larger
bin_edges
bin_freq
n_smaller
n_larger
bin_edges
bin_freq
n_smaller
n_larger
bin_edges
bin_freq
n_smaller
n_larger
bin_edges
bin_freq
n_smaller
n_larger
qual
assembly_name
allele_string
ancestral
colocated_variants
context
end
id
input
intergenic_consequences
most_severe_consequence
motif_feature_consequences
regulatory_feature_consequences
seq_region_name
start
strand
transcript_consequences
variant_class
BaseQRankSum
ClippingRankSum
DB
DP
DS
END
FS
HaplotypeScore
InbreedingCoeff
MQ
MQRankSum
NEGATIVE_TRAIN_SITE
POSITIVE_TRAIN_SITE
QD
ReadPosRankSum
SOR
VQSLOD
culprit
rsid
original_alleles
locus<GRCh38> array<str> str float64 set<str> array<float64> array<int32> array<int32> int32 array<str> int32 int32 float64 float64 dict<str, struct{GeneID: int32, Total_submissions: int32, Total_alleles: int32, Submissions_reporting_this_gene: int32, Alleles_reported_Pathogenic_Likely_pathogenic: int32, Gene_MIM_number: int32, Number_uncertain: int32, Number_with_conflicts: int32}> array<struct{Type: str, Name: str, GeneID: int32, GeneSymbol: str, HGNC_ID: str, ClinicalSignificance: str, ClinSigSimple: int32, LastEvaluated: str, `RS# (dbSNP)`: int32, `nsv/esv (dbVar)`: str, RCVaccession: str, PhenotypeIDS: str, PhenotypeList: str, Origin: str, OriginSimple: str, Assembly: str, ChromosomeAccession: str, ReferenceAllele: str, AlternateAllele: str, Cytogenetic: str, ReviewStatus: str, NumberSubmitters: int32, Guidelines: str, TestedInGTR: str, OtherIDs: str, SubmitterCategories: int32, VariationID: int32, AlleleID: int32}> dict<str, struct{Ensembl_gene: str, chr: str, Gene_old_names: str, Gene_other_names: str, `Uniprot_acc(HGNC/Uniprot)`: str, `Uniprot_id(HGNC/Uniprot)`: str, Entrez_gene_id: int32, CCDS_id: str, Refseq_id: str, ucsc_id: str, MIM_id: str, OMIM_id: int32, Gene_full_name: str, `Pathway(Uniprot)`: str, `Pathway(BioCarta)_short`: str, `Pathway(BioCarta)_full`: str, `Pathway(ConsensusPathDB)`: str, `Pathway(KEGG)_id`: str, `Pathway(KEGG)_full`: str, Function_description: str, Disease_description: str, MIM_phenotype_id: str, MIM_disease: str, Orphanet_disorder_id: str, Orphanet_disorder: str, Orphanet_association_type: str, `Trait_association(GWAS)`: str, GO_biological_process: str, GO_cellular_component: str, GO_molecular_function: str, `Tissue_specificity(Uniprot)`: str, `Expression(egenetics)`: str, `Expression(GNF/Atlas)`: str, `Interactions(IntAct)`: str, `Interactions(BioGRID)`: str, `Interactions(ConsensusPathDB)`: str, `P(HI)`: float64, HIPred_score: float64, HIPred: str, GHIS: float64, `P(rec)`: float64, Known_rec_info: str, RVIS_EVS: float64, RVIS_percentile_EVS: float64, `LoF-FDR_ExAC`: float64, RVIS_ExAC: float64, RVIS_percentile_ExAC: float64, ExAC_pLI: float64, ExAC_pRec: float64, ExAC_pNull: float64, ExAC_nonTCGA_pLI: float64, ExAC_nonTCGA_pRec: float64, ExAC_nonTCGA_pNull: float64, ExAC_nonpsych_pLI: float64, ExAC_nonpsych_pRec: float64, ExAC_nonpsych_pNull: float64, gnomAD_pLI: str, gnomAD_pRec: str, gnomAD_pNull: str, `ExAC_del.score`: float64, `ExAC_dup.score`: float64, `ExAC_cnv.score`: float64, ExAC_cnv_flag: str, GDI: float64, `GDI-Phred`: float64, `Gene damage prediction (all disease-causing genes)`: str, `Gene damage prediction (all Mendelian disease-causing genes)`: str, `Gene damage prediction (Mendelian AD disease-causing genes)`: str, `Gene damage prediction (Mendelian AR disease-causing genes)`: str, `Gene damage prediction (all PID disease-causing genes)`: str, `Gene damage prediction (PID AD disease-causing genes)`: str, `Gene damage prediction (PID AR disease-causing genes)`: str, `Gene damage prediction (all cancer disease-causing genes)`: str, `Gene damage prediction (cancer recessive disease-causing genes)`: str, `Gene damage prediction (cancer dominant disease-causing genes)`: str, LoFtool_score: float64, `SORVA_LOF_MAF0.005_HetOrHom`: float64, `SORVA_LOF_MAF0.005_HomOrCompoundHet`: float64, `SORVA_LOF_MAF0.001_HetOrHom`: float64, `SORVA_LOF_MAF0.001_HomOrCompoundHet`: float64, `SORVA_LOForMissense_MAF0.005_HetOrHom`: float64, `SORVA_LOForMissense_MAF0.005_HomOrCompoundHet`: float64, `SORVA_LOForMissense_MAF0.001_HetOrHom`: float64, `SORVA_LOForMissense_MAF0.001_HomOrCompoundHet`: float64, Essential_gene: str, Essential_gene_CRISPR: str, Essential_gene_CRISPR2: str, `Essential_gene_gene-trap`: str, Gene_indispensability_score: float64, Gene_indispensability_pred: str, MGI_mouse_gene: str, MGI_mouse_phenotype: str, ZFIN_zebrafish_gene: str, ZFIN_zebrafish_structure: str, ZFIN_zebrafish_phenotype_quality: str, ZFIN_zebrafish_phenotype_tag: str}> array<struct{`pos(1-based)`: int32, ref: str, alt: str, aaref: str, aaalt: str, rs_dbSNP151: str, hg19_chr: str, `hg19_pos(1-based)`: int32, hg18_chr: str, `hg18_pos(1-based)`: int32, aapos: str, genename: str, Ensembl_geneid: str, Ensembl_transcriptid: str, Ensembl_proteinid: str, Uniprot_acc: str, Uniprot_entry: str, HGVSc_ANNOVAR: str, HGVSp_ANNOVAR: str, HGVSc_snpEff: str, HGVSp_snpEff: str, HGVSc_VEP: str, HGVSp_VEP: str, APPRIS: str, GENCODE_basic: str, TSL: str, VEP_canonical: str, cds_strand: str, refcodon: str, codonpos: str, codon_degeneracy: str, Ancestral_allele: str, AltaiNeandertal: str, Denisova: str, VindijiaNeandertal: str, SIFT_score: str, SIFT_converted_rankscore: float64, SIFT_pred: str, SIFT4G_score: str, SIFT4G_converted_rankscore: float64, SIFT4G_pred: str, Polyphen2_HDIV_score: str, Polyphen2_HDIV_rankscore: float64, Polyphen2_HDIV_pred: str, Polyphen2_HVAR_score: str, Polyphen2_HVAR_rankscore: float64, Polyphen2_HVAR_pred: str, LRT_score: float64, LRT_converted_rankscore: float64, LRT_pred: str, LRT_Omega: float64, MutationTaster_score: str, MutationTaster_converted_rankscore: float64, MutationTaster_pred: str, MutationTaster_model: str, MutationTaster_AAE: str, MutationAssessor_score: str, MutationAssessor_rankscore: float64, MutationAssessor_pred: str, FATHMM_score: str, FATHMM_converted_rankscore: float64, FATHMM_pred: str, PROVEAN_score: str, PROVEAN_converted_rankscore: float64, PROVEAN_pred: str, VEST4_score: str, VEST4_rankscore: float64, MetaSVM_score: float64, MetaSVM_rankscore: float64, MetaSVM_pred: str, MetaLR_score: float64, MetaLR_rankscore: float64, MetaLR_pred: str, Reliability_index: int32, `M-CAP_score`: float64, `M-CAP_rankscore`: float64, `M-CAP_pred`: str, REVEL_score: float64, REVEL_rankscore: float64, MutPred_score: str, MutPred_rankscore: float64, MutPred_protID: str, MutPred_AAchange: str, MutPred_Top5features: str, MVP_score: str, MVP_rankscore: float64, MPC_score: str, MPC_rankscore: float64, PrimateAI_score: float64, PrimateAI_rankscore: float64, PrimateAI_pred: str, DEOGEN2_score: str, DEOGEN2_rankscore: float64, DEOGEN2_pred: str, Aloft_Fraction_transcripts_affected: str, Aloft_prob_Tolerant: str, Aloft_prob_Recessive: str, Aloft_prob_Dominant: str, Aloft_pred: str, Aloft_Confidence: str, CADD_raw: float64, CADD_raw_rankscore: float64, CADD_phred: float64, DANN_score: float64, DANN_rankscore: float64, `fathmm-MKL_coding_score`: float64, `fathmm-MKL_coding_rankscore`: float64, `fathmm-MKL_coding_pred`: str, `fathmm-MKL_coding_group`: str, `fathmm-XF_coding_score`: float64, `fathmm-XF_coding_rankscore`: float64, `fathmm-XF_coding_pred`: str, `Eigen-raw_coding`: float64, `Eigen-raw_coding_rankscore`: float64, `Eigen-pred_coding`: float64, `Eigen-PC-raw_coding`: float64, `Eigen-PC-raw_coding_rankscore`: float64, `Eigen-PC-phred_coding`: float64, GenoCanyon_score: float64, GenoCanyon_rankscore: float64, integrated_fitCons_score: float64, integrated_fitCons_rankscore: float64, integrated_confidence_value: int32, GM12878_fitCons_score: float64, GM12878_fitCons_rankscore: float64, GM12878_confidence_value: int32, `H1-hESC_fitCons_score`: float64, `H1-hESC_fitCons_rankscore`: float64, `H1-hESC_confidence_value`: int32, HUVEC_fitCons_score: float64, HUVEC_fitCons_rankscore: float64, HUVEC_confidence_value: int32, LINSIGHT: float64, LINSIGHT_rankscore: float64, `GERP++_NR`: float64, `GERP++_RS`: float64, `GERP++_RS_rankscore`: float64, phyloP100way_vertebrate: float64, phyloP100way_vertebrate_rankscore: float64, phyloP30way_mammalian: float64, phyloP30way_mammalian_rankscore: float64, phyloP17way_primate: float64, phyloP17way_primate_rankscore: float64, phastCons100way_vertebrate: float64, phastCons100way_vertebrate_rankscore: float64, phastCons30way_mammalian: float64, phastCons30way_mammalian_rankscore: float64, phastCons17way_primate: float64, phastCons17way_primate_rankscore: float64, SiPhy_29way_pi: str, SiPhy_29way_logOdds: float64, SiPhy_29way_logOdds_rankscore: float64, bStatistic: int32, bStatistic_rankscore: float64, `1000Gp3_AC`: int32, `1000Gp3_AF`: float64, `1000Gp3_AFR_AC`: int32, `1000Gp3_AFR_AF`: float64, `1000Gp3_EUR_AC`: int32, `1000Gp3_EUR_AF`: float64, `1000Gp3_AMR_AC`: int32, `1000Gp3_AMR_AF`: float64, `1000Gp3_EAS_AC`: int32, `1000Gp3_EAS_AF`: float64, `1000Gp3_SAS_AC`: int32, `1000Gp3_SAS_AF`: float64, TWINSUK_AC: int32, TWINSUK_AF: float64, ALSPAC_AC: int32, ALSPAC_AF: float64, UK10K_AC: int32, UK10K_AF: float64, ESP6500_AA_AC: int32, ESP6500_AA_AF: float64, ESP6500_EA_AC: int32, ESP6500_EA_AF: float64, ExAC_AC: int32, ExAC_AF: float64, ExAC_Adj_AC: int32, ExAC_Adj_AF: float64, ExAC_AFR_AC: int32, ExAC_AFR_AF: float64, ExAC_AMR_AC: int32, ExAC_AMR_AF: float64, ExAC_EAS_AC: int32, ExAC_EAS_AF: float64, ExAC_FIN_AC: int32, ExAC_FIN_AF: float64, ExAC_NFE_AC: int32, ExAC_NFE_AF: float64, ExAC_SAS_AC: int32, ExAC_SAS_AF: float64, ExAC_nonTCGA_AC: int32, ExAC_nonTCGA_AF: float64, ExAC_nonTCGA_Adj_AC: int32, ExAC_nonTCGA_Adj_AF: float64, ExAC_nonTCGA_AFR_AC: int32, ExAC_nonTCGA_AFR_AF: float64, ExAC_nonTCGA_AMR_AC: int32, ExAC_nonTCGA_AMR_AF: float64, ExAC_nonTCGA_EAS_AC: int32, ExAC_nonTCGA_EAS_AF: float64, ExAC_nonTCGA_FIN_AC: int32, ExAC_nonTCGA_FIN_AF: float64, ExAC_nonTCGA_NFE_AC: int32, ExAC_nonTCGA_NFE_AF: float64, ExAC_nonTCGA_SAS_AC: int32, ExAC_nonTCGA_SAS_AF: float64, ExAC_nonpsych_AC: int32, ExAC_nonpsych_AF: float64, ExAC_nonpsych_Adj_AC: int32, ExAC_nonpsych_Adj_AF: float64, ExAC_nonpsych_AFR_AC: int32, ExAC_nonpsych_AFR_AF: float64, ExAC_nonpsych_AMR_AC: int32, ExAC_nonpsych_AMR_AF: float64, ExAC_nonpsych_EAS_AC: int32, ExAC_nonpsych_EAS_AF: float64, ExAC_nonpsych_FIN_AC: int32, ExAC_nonpsych_FIN_AF: float64, ExAC_nonpsych_NFE_AC: int32, ExAC_nonpsych_NFE_AF: float64, ExAC_nonpsych_SAS_AC: int32, ExAC_nonpsych_SAS_AF: float64, gnomAD_exomes_flag: str, gnomAD_exomes_AC: int32, gnomAD_exomes_AN: int32, gnomAD_exomes_AF: float64, gnomAD_exomes_nhomalt: int32, gnomAD_exomes_AFR_AC: int32, gnomAD_exomes_AFR_AN: int32, gnomAD_exomes_AFR_AF: float64, gnomAD_exomes_AFR_nhomalt: int32, gnomAD_exomes_AMR_AC: int32, gnomAD_exomes_AMR_AN: int32, gnomAD_exomes_AMR_AF: float64, gnomAD_exomes_AMR_nhomalt: int32, gnomAD_exomes_ASJ_AC: int32, gnomAD_exomes_ASJ_AN: int32, gnomAD_exomes_ASJ_AF: float64, gnomAD_exomes_ASJ_nhomalt: int32, gnomAD_exomes_EAS_AC: int32, gnomAD_exomes_EAS_AN: int32, gnomAD_exomes_EAS_AF: float64, gnomAD_exomes_EAS_nhomalt: int32, gnomAD_exomes_FIN_AC: int32, gnomAD_exomes_FIN_AN: int32, gnomAD_exomes_FIN_AF: float64, gnomAD_exomes_FIN_nhomalt: int32, gnomAD_exomes_NFE_AC: int32, gnomAD_exomes_NFE_AN: int32, gnomAD_exomes_NFE_AF: float64, gnomAD_exomes_NFE_nhomalt: int32, gnomAD_exomes_SAS_AC: int32, gnomAD_exomes_SAS_AN: int32, gnomAD_exomes_SAS_AF: float64, gnomAD_exomes_SAS_nhomalt: int32, gnomAD_exomes_POPMAX_AC: int32, gnomAD_exomes_POPMAX_AN: int32, gnomAD_exomes_POPMAX_AF: float64, gnomAD_exomes_POPMAX_nhomalt: int32, gnomAD_exomes_controls_AC: int32, gnomAD_exomes_controls_AN: int32, gnomAD_exomes_controls_AF: float64, gnomAD_exomes_controls_nhomalt: int32, gnomAD_exomes_controls_AFR_AC: int32, gnomAD_exomes_controls_AFR_AN: int32, gnomAD_exomes_controls_AFR_AF: float64, gnomAD_exomes_controls_AFR_nhomalt: int32, gnomAD_exomes_controls_AMR_AC: int32, gnomAD_exomes_controls_AMR_AN: int32, gnomAD_exomes_controls_AMR_AF: float64, gnomAD_exomes_controls_AMR_nhomalt: int32, gnomAD_exomes_controls_ASJ_AC: int32, gnomAD_exomes_controls_ASJ_AN: int32, gnomAD_exomes_controls_ASJ_AF: float64, gnomAD_exomes_controls_ASJ_nhomalt: int32, gnomAD_exomes_controls_EAS_AC: int32, gnomAD_exomes_controls_EAS_AN: int32, gnomAD_exomes_controls_EAS_AF: float64, gnomAD_exomes_controls_EAS_nhomalt: int32, gnomAD_exomes_controls_FIN_AC: int32, gnomAD_exomes_controls_FIN_AN: int32, gnomAD_exomes_controls_FIN_AF: float64, gnomAD_exomes_controls_FIN_nhomalt: int32, gnomAD_exomes_controls_NFE_AC: int32, gnomAD_exomes_controls_NFE_AN: int32, gnomAD_exomes_controls_NFE_AF: float64, gnomAD_exomes_controls_NFE_nhomalt: int32, gnomAD_exomes_controls_SAS_AC: int32, gnomAD_exomes_controls_SAS_AN: int32, gnomAD_exomes_controls_SAS_AF: float64, gnomAD_exomes_controls_SAS_nhomalt: int32, gnomAD_exomes_controls_POPMAX_AC: int32, gnomAD_exomes_controls_POPMAX_AN: int32, gnomAD_exomes_controls_POPMAX_AF: float64, gnomAD_exomes_controls_POPMAX_nhomalt: int32, gnomAD_genomes_flag: str, gnomAD_genomes_AC: int32, gnomAD_genomes_AN: int32, gnomAD_genomes_AF: float64, gnomAD_genomes_nhomalt: int32, gnomAD_genomes_AFR_AC: int32, gnomAD_genomes_AFR_AN: int32, gnomAD_genomes_AFR_AF: float64, gnomAD_genomes_AFR_nhomalt: int32, gnomAD_genomes_AMR_AC: int32, gnomAD_genomes_AMR_AN: int32, gnomAD_genomes_AMR_AF: float64, gnomAD_genomes_AMR_nhomalt: int32, gnomAD_genomes_ASJ_AC: int32, gnomAD_genomes_ASJ_AN: int32, gnomAD_genomes_ASJ_AF: float64, gnomAD_genomes_ASJ_nhomalt: int32, gnomAD_genomes_EAS_AC: int32, gnomAD_genomes_EAS_AN: int32, gnomAD_genomes_EAS_AF: float64, gnomAD_genomes_EAS_nhomalt: int32, gnomAD_genomes_FIN_AC: int32, gnomAD_genomes_FIN_AN: int32, gnomAD_genomes_FIN_AF: float64, gnomAD_genomes_FIN_nhomalt: int32, gnomAD_genomes_NFE_AC: int32, gnomAD_genomes_NFE_AN: int32, gnomAD_genomes_NFE_AF: float64, gnomAD_genomes_NFE_nhomalt: int32, gnomAD_genomes_POPMAX_AC: int32, gnomAD_genomes_POPMAX_AN: int32, gnomAD_genomes_POPMAX_AF: float64, gnomAD_genomes_POPMAX_nhomalt: int32, gnomAD_genomes_controls_AC: int32, gnomAD_genomes_controls_AN: int32, gnomAD_genomes_controls_AF: float64, gnomAD_genomes_controls_nhomalt: int32, gnomAD_genomes_controls_AFR_AC: int32, gnomAD_genomes_controls_AFR_AN: int32, gnomAD_genomes_controls_AFR_AF: float64, gnomAD_genomes_controls_AFR_nhomalt: int32, gnomAD_genomes_controls_AMR_AC: int32, gnomAD_genomes_controls_AMR_AN: int32, gnomAD_genomes_controls_AMR_AF: float64, gnomAD_genomes_controls_AMR_nhomalt: int32, gnomAD_genomes_controls_ASJ_AC: int32, gnomAD_genomes_controls_ASJ_AN: int32, gnomAD_genomes_controls_ASJ_AF: float64, gnomAD_genomes_controls_ASJ_nhomalt: int32, gnomAD_genomes_controls_EAS_AC: int32, gnomAD_genomes_controls_EAS_AN: int32, gnomAD_genomes_controls_EAS_AF: float64, gnomAD_genomes_controls_EAS_nhomalt: int32, gnomAD_genomes_controls_FIN_AC: int32, gnomAD_genomes_controls_FIN_AN: int32, gnomAD_genomes_controls_FIN_AF: float64, gnomAD_genomes_controls_FIN_nhomalt: int32, gnomAD_genomes_controls_NFE_AC: int32, gnomAD_genomes_controls_NFE_AN: int32, gnomAD_genomes_controls_NFE_AF: float64, gnomAD_genomes_controls_NFE_nhomalt: int32, gnomAD_genomes_controls_POPMAX_AC: int32, gnomAD_genomes_controls_POPMAX_AN: int32, gnomAD_genomes_controls_POPMAX_AF: float64, gnomAD_genomes_controls_POPMAX_nhomalt: int32, clinvar_id: int32, clinvar_clnsig: str, clinvar_trait: str, clinvar_review: str, clinvar_hgvs: str, clinvar_var_source: str, clinvar_MedGen_id: str, clinvar_OMIM_id: str, clinvar_Orphanet_id: str, Interpro_domain: str, GTEx_V7_gene: str, GTEx_V7_tissue: str, Geuvadis_eQTL_target_gene: str, chr: str}> array<struct{rsid: str}> bool bool locus<GRCh37> array<struct{AC: int32, AF: float64, AN: int32, homozygote_count: int32}> array<struct{bin_edges: array<float64>, bin_freq: array<int64>, n_smaller: int64, n_larger: int64}> array<struct{bin_edges: array<float64>, bin_freq: array<int64>, n_smaller: int64, n_larger: int64}> array<struct{AC: int32, AF: float64, AN: int32, homozygote_count: int32, pop: str}> array<struct{meta: dict<str, str>, faf95: float64, faf99: float64}> bool bool bool bool str str int32 bool bool float64 float64 float64 float64 float64 float64 float64 float64 float64 int32 bool bool bool bool bool bool int32 bool bool str float64 bool bool float64 int64 int64 int64 int64 int64 int64 int64 int64 set<str> array<float64> array<int64> int64 int64 array<float64> array<int64> int64 int64 array<float64> array<int64> int64 int64 array<float64> array<int64> int64 int64 array<float64> array<int64> int64 int64 float64 str str str array<struct{aa_allele: str, aa_maf: float64, afr_allele: str, afr_maf: float64, allele_string: str, amr_allele: str, amr_maf: float64, clin_sig: array<str>, end: int32, eas_allele: str, eas_maf: float64, ea_allele: str, ea_maf: float64, eur_allele: str, eur_maf: float64, exac_adj_allele: str, exac_adj_maf: float64, exac_allele: str, exac_afr_allele: str, exac_afr_maf: float64, exac_amr_allele: str, exac_amr_maf: float64, exac_eas_allele: str, exac_eas_maf: float64, exac_fin_allele: str, exac_fin_maf: float64, exac_maf: float64, exac_nfe_allele: str, exac_nfe_maf: float64, exac_oth_allele: str, exac_oth_maf: float64, exac_sas_allele: str, exac_sas_maf: float64, id: str, minor_allele: str, minor_allele_freq: float64, phenotype_or_disease: int32, pubmed: array<int32>, sas_allele: str, sas_maf: float64, somatic: int32, start: int32, strand: int32}> str int32 str str array<struct{allele_num: int32, consequence_terms: array<str>, impact: str, minimised: int32, variant_allele: str}> str array<struct{allele_num: int32, consequence_terms: array<str>, high_inf_pos: str, impact: str, minimised: int32, motif_feature_id: str, motif_name: str, motif_pos: int32, motif_score_change: float64, strand: int32, variant_allele: str}> array<struct{allele_num: int32, biotype: str, consequence_terms: array<str>, impact: str, minimised: int32, regulatory_feature_id: str, variant_allele: str}> str int32 int32 array<struct{allele_num: int32, amino_acids: str, biotype: str, canonical: int32, ccds: str, cdna_start: int32, cdna_end: int32, cds_end: int32, cds_start: int32, codons: str, consequence_terms: array<str>, distance: int32, domains: array<struct{db: str, name: str}>, exon: str, gene_id: str, gene_pheno: int32, gene_symbol: str, gene_symbol_source: str, hgnc_id: str, hgvsc: str, hgvsp: str, hgvs_offset: int32, impact: str, intron: str, lof: str, lof_flags: str, lof_filter: str, lof_info: str, minimised: int32, polyphen_prediction: str, polyphen_score: float64, protein_end: int32, protein_start: int32, protein_id: str, sift_prediction: str, sift_score: float64, strand: int32, swissprot: str, transcript_id: str, trembl: str, uniparc: str, variant_allele: str}> str float64 float64 bool int32 bool int32 float64 float64 float64 float64 float64 bool bool float64 float64 float64 float64 str str array<str>
chr1:69026 ["T","G"] "chr1_69026_T_G" 3.80e+01 NA [1.00e-06] [38] [1] 226780 ["G|downstream_gene_variant|MODIFIER|OR4G11P|ENSG00000240361|Transcript|ENST00000642116|processed_transcript|||||||||||1|4910|1||SNV|1|HGNC|HGNC:31276|YES|||||||||||||||||||||||","G|intron_variant|MODIFIER|OR4F5|ENSG00000186092|Transcript|ENST00000641515|protein_coding||2/2|||||||||1||1||SNV|1|HGNC|HGNC:14825|YES||ENSP00000493376|||||||||||||||||||||"] 0 0 1.00e+00 1.17e+01 {"OR4F5":(79501,6,6,NA,NA,NA,NA,NA)} [("copy number gain","NCBI36/hg18 1p36.33(chr1:4737-338603)x3",-1,NA,NA,"Uncertain significance",0,NA,-1,"nsv2768335","RCV000453780","na","See cases","not provided","not provided","NCBI36","NC_000001.9","na","na","1p36.33","no assertion criteria provided",1,"","N","dbVar:nssv13638640,dbVar:nsv2768335",2,393629,380521),("copy number loss","NCBI36/hg18 1p36.33-36.23(chr1:4737-7424898)x1",-1,NA,NA,"Pathogenic",1,NA,-1,"nsv2771294","RCV000450793","na","See cases","not provided","not provided","NCBI36","NC_000001.9","na","na","1p36.33-36.23","no assertion criteria provided",1,"","N","dbVar:nssv13638713,dbVar:nsv2771294",2,398920,385893),("copy number loss","NCBI36/hg18 1p36.33-36.23(chr1:4737-8734404)x1",-1,NA,NA,"Pathogenic",1,NA,-1,"nsv2771750","RCV000451919","na","See cases","not provided","not provided","NCBI36","NC_000001.9","na","na","1p36.33-36.23","no assertion criteria provided",1,"","N","dbVar:nssv13655807,dbVar:nsv2771750",2,399104,386077)] {"OR4F5":("ENSG00000186092","1",NA,NA,"Q8NH21","OR4F5_HUMAN",79501,"CCDS30547","NM_001005484","uc001aal.1",NA,618355,"olfactory receptor family 4 subfamily F member 5",NA,NA,NA,"Olfactory transduction - Homo sapiens (human);Olfactory receptor activity;Signaling by GPCR;Signal Transduction;Olfactory Signaling Pathway;G alpha (s) signalling events;GPCR downstream signalling","hsa04740","Olfactory transduction","FUNCTION: Odorant receptor. {ECO:0000305}.; ",NA,NA,NA,NA,NA,NA,NA,"G protein-coupled receptor signaling pathway;detection of chemical stimulus involved in sensory perception of smell","plasma membrane;integral component of membrane","G protein-coupled receptor activity;olfactory receptor activity",NA,NA,NA,NA,NA,NA,6.09e-02,1.87e-01,"N",NA,7.16e-02,NA,NA,NA,9.91e-01,NA,NA,1.76e-01,6.44e-01,1.80e-01,5.50e-01,3.97e-01,5.31e-02,1.80e-01,6.23e-01,1.97e-01,"3.0650e-02","6.1116e-01","3.5819e-01",NA,NA,NA,NA,1.13e+02,2.32e+00,"Medium","Medium","Medium","Medium","Medium","Medium","Medium","Medium","Medium","Medium",NA,3.99e-04,0.00e+00,3.99e-04,0.00e+00,2.28e-02,4.39e-03,8.39e-03,2.00e-03,NA,NA,NA,"N",1.14e-01,"N",NA,NA,NA,NA,NA,NA)} NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
When I try to pull out individual rows when filtering I end up I found a previous tutorial you guys ran from the hail website on annotation and filter:
https://hail.is/docs/0.2/tutorials/05-filter-annotate.html
Do you have anymore relevant or recommended documentation?