Welcome to __ __ <>__ / /_/ /__ __/ / / __ / _ `/ / / /_/ /_/\_,_/_/_/ version 0.2.57-582b2e31b8bd LOGGING: writing to /mnt/var/lib/hadoop/steps/s-3SGPA7YDZW8YH/hail-20210622-1900-0.2.57-582b2e31b8bd.log JObject(List((name,JString(VEP)), (config,JString(s3://s3_bucket/vep99/vep99-loftee-grch38-aws.json)), (csq,JBool(false)), (blockSize,JInt(1000)))) [Stage 0:> (0 + 32) / 500] [Stage 0:> (3 + 32) / 500] [Stage 0:==> (23 + 33) / 500] [Stage 0:======> (55 + 32) / 500] [Stage 0:=========> (83 + 32) / 500] [Stage 0:============> (116 + 32) / 500] [Stage 0:=================> (161 + 36) / 500] [Stage 0:=======================> (215 + 32) / 500] [Stage 0:============================> (262 + 32) / 500] [Stage 0:==================================> (321 + 32) / 500] [Stage 0:==========================================> (390 + 32) / 500] [Stage 0:==================================================> (466 + 32) / 500]2021-06-22 19:01:13 Hail: INFO: Coerced sorted dataset 2021-06-22 19:01:14 Hail: INFO: Coerced sorted dataset [Stage 2:=======================> (215 + 32) / 500] [Stage 2:===============================> (294 + 33) / 500] [Stage 2:=========================================> (380 + 32) / 500] [Stage 2:==================================================> (463 + 32) / 500]2021-06-22 19:01:15 Hail: INFO: Coerced sorted dataset INFO:root:==> Done with VEP 2021-06-22 19:01:20 Hail: INFO: Reading table without type imputation Loading field '#CHROM' as type str (not specified) Loading field 'POSITION' as type str (not specified) Loading field 'HGMD_ID' as type str (not specified) WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has vep annotation. INFO:root:------------ vep_root.transcript_consequences ------------ [Stage 4:==========================> (241 + 32) / 500] [Stage 4:===================================> (328 + 32) / 500] [Stage 4:===========================================> (404 + 32) / 500]2021-06-22 19:01:26 Hail: INFO: Coerced sorted dataset 2021-06-22 19:01:26 Hail: INFO: Coerced sorted dataset [Stage 6:=============================> (276 + 32) / 500] [Stage 6:=======================================> (364 + 32) / 500] [Stage 6:==================================================> (471 + 29) / 500]2021-06-22 19:01:27 Hail: INFO: Coerced sorted dataset [Stage 8:> (0 + 1) / 1] [Stage 9:> (0 + 1) / 1] [Stage 10:=============================> (1 + 1) / 2]+----------------+------------+ | locus | alleles | +----------------+------------+ | locus | array | +----------------+------------+ | chr4:113358472 | ["T","C"] | | chr8:144415811 | ["A","G"] | +----------------+------------+ +------------------------------------------------------------------------------+ | | +------------------------------------------------------------------------------+ | array | +-----------------------------------------------------+ | set | +-----------------------------------------------------+ | {"downstream_gene_variant","upstream_gene_variant"} | +-----------------------------------------------------+ INFO:root:------------ omit_consequence_terms ------------ INFO:root:------------ result ------------ [Stage 13:================> (154 + 32) / 500] [Stage 13:==========================> (246 + 32) / 500] [Stage 13:====================================> (347 + 32) / 500] [Stage 13:==============================================> (439 + 32) / 500]2021-06-22 19:01:43 Hail: INFO: Coerced sorted dataset 2021-06-22 19:01:43 Hail: INFO: Coerced sorted dataset [Stage 15:=============================> (278 + 32) / 500] [Stage 15:========================================> (381 + 32) / 500] [Stage 15:==================================================> (476 + 24) / 500] [Stage 15:=====================================================>(499 + 1) / 500]2021-06-22 19:01:45 Hail: INFO: Coerced sorted dataset +----------------+------------+ | locus | alleles | +----------------+------------+ | locus | array | +----------------+------------+ | chr4:113358472 | ["T","C"] | | chr8:144415811 | ["A","G"] | +----------------+------------+ +------------------------------------------------------------------------------+ | | +------------------------------------------------------------------------------+ | array already has filters annotation. INFO:luigi_pipeline.lib.model.base_mt_schema:Overwriting matrix table annotation filters WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has rsid annotation. INFO:luigi_pipeline.lib.model.base_mt_schema:Overwriting matrix table annotation rsid WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has vep annotation. INFO:luigi_pipeline.lib.model.base_mt_schema:Overwriting matrix table annotation vep ---------------------------------------- Global fields: 'gencodeVersion': str 'sourceFilePath': str 'genomeVersion': str 'sampleType': str 'hail_version': str ---------------------------------------- Column fields: 's': str ---------------------------------------- Row fields: 'locus': locus 'alleles': array 'aIndex': int32 'AC': int32 'AF': float64 'alt': str 'AN': int32 'bgi': struct { AC: int32, AF: float64, AN: int32 } 'cadd': struct { PHRED: float32 } 'clinvar': struct { allele_id: int32, clinical_significance: str, gold_stars: int32 } 'codingGeneIds': set 'contig': str 'dbnsfp': struct { SIFT_pred: str, Polyphen2_HVAR_pred: str, MutationTaster_pred: str, FATHMM_pred: str, MetaSVM_pred: str, REVEL_score: str, GERP_RS: str, phastCons100way_vertebrate: str } 'docId': str 'domains': set 'eigen': struct { Eigen_phred: float64 } 'end': int32 'exac': struct { AF_POPMAX: float64, AF: float64, AC_Adj: int32, AC_Het: int32, AC_Hom: int32, AC_Hemi: int32, AN_Adj: int32 } 'filters': set 'g1k': struct { AC: int32, AF: float64, AN: int32, POPMAX_AF: float64 } 'geneIds': set 'geno2mp': struct { HPO_Count: int32 } 'genotypes': array 'gnomad_exome_coverage': float64 'gnomad_exomes': struct { AF: float64, AN: int32, AC: int32, FAF_AF: float64, AF_POPMAX_OR_GLOBAL: float64, Hom: int32, Hemi: int32 } 'gnomad_genome_coverage': float64 'gnomad_genomes': struct { AF: float64, AN: int32, AC: int32, FAF_AF: float64, AF_POPMAX_OR_GLOBAL: float64, Hom: int32, Hemi: int32 } 'hgmd': struct { accession: str, class: str } 'hgmd_like': array 'hgsc_wgs': struct { AC: int32, AF: float64, AN: int32 } 'mainTranscript': struct { biotype: str, canonical: int32, category: str, cdna_start: int32, cdna_end: int32, codons: str, gene_id: str, gene_symbol: str, hgvs: str, hgvsc: str, major_consequence: str, major_consequence_rank: int32, transcript_id: str, amino_acids: str, domains: str, hgvsp: str, lof: str, lof_flags: str, lof_filter: str, lof_info: str, polyphen_prediction: str, protein_id: str, sift_prediction: str } 'mpc': struct { MPC: str } 'nisc': struct { AC: int32, AF: float64, AN: int32 } 'originalAltAlleles': array 'pos': int32 'primate_ai': struct { score: float64 } 'ref': str 'rsid': str 'samples_ab': struct { 0_to_5: set, 5_to_10: set, 10_to_15: set, 15_to_20: set, 20_to_25: set, 25_to_30: set, 30_to_35: set, 35_to_40: set, 40_to_45: set } 'samples_gq': struct { 0_to_5: set, 5_to_10: set, 10_to_15: set, 15_to_20: set, 20_to_25: set, 25_to_30: set, 30_to_35: set, 35_to_40: set, 40_to_45: set, 45_to_50: set, 50_to_55: set, 55_to_60: set, 60_to_65: set, 65_to_70: set, 70_to_75: set, 75_to_80: set, 80_to_85: set, 85_to_90: set, 90_to_95: set } 'samples_no_call': set 'samples_num_alt': struct { 1: set, 2: set } 'sortedTranscriptConsequences': array, domains: array, major_consequence: str, category: str, hgvs: str, major_consequence_rank: int32, transcript_rank: int32 }> 'splice_ai': struct { delta_score: float64 } 'start': int32 'topmed': struct { AC: int32, AF: float64, AN: int32, Hom: int32, Het: int32 } 'transcriptConsequenceTerms': set 'transcriptIds': set 'utrVariantAnnotation': array 'variantId': str 'vep': struct { assembly_name: str, allele_string: str, ancestral: str, colocated_variants: array, end: int32, eas_allele: str, eas_maf: float64, ea_allele: str, ea_maf: float64, eur_allele: str, eur_maf: float64, exac_adj_allele: str, exac_adj_maf: float64, exac_allele: str, exac_afr_allele: str, exac_afr_maf: float64, exac_amr_allele: str, exac_amr_maf: float64, exac_eas_allele: str, exac_eas_maf: float64, exac_fin_allele: str, exac_fin_maf: float64, exac_maf: float64, exac_nfe_allele: str, exac_nfe_maf: float64, exac_oth_allele: str, exac_oth_maf: float64, exac_sas_allele: str, exac_sas_maf: float64, id: str, minor_allele: str, minor_allele_freq: float64, phenotype_or_disease: int32, pubmed: array, sas_allele: str, sas_maf: float64, somatic: int32, start: int32, strand: int32 }>, context: str, end: int32, id: str, input: str, intergenic_consequences: array, impact: str, minimised: int32, variant_allele: str }>, most_severe_consequence: str, motif_feature_consequences: array, high_inf_pos: str, impact: str, minimised: int32, motif_feature_id: str, motif_name: str, motif_pos: int32, motif_score_change: float64, strand: int32, variant_allele: str }>, regulatory_feature_consequences: array, impact: str, minimised: int32, regulatory_feature_id: str, variant_allele: str }>, seq_region_name: str, start: int32, strand: int32, transcript_consequences: array, distance: int32, domains: array, exon: str, gene_id: str, gene_pheno: int32, gene_symbol: str, gene_symbol_source: str, hgnc_id: str, hgvsc: str, hgvsp: str, hgvs_offset: int32, impact: str, intron: str, lof: str, lof_flags: str, lof_filter: str, lof_info: str, minimised: int32, polyphen_prediction: str, polyphen_score: float64, protein_end: int32, protein_start: int32, protein_id: str, sift_prediction: str, sift_score: float64, strand: int32, swissprot: str, transcript_id: str, trembl: str, tsl: int32, uniparc: str, variant_allele: str }>, variant_class: str } 'xpos': int64 'xstart': int64 'xstop': int64 ---------------------------------------- Entry fields: 'AD': array 'DP': int32 'GQ': int32 'GT': call 'MIN_DP': int32 'PGT': call 'PID': str 'PL': array 'PP': array 'PS': int32 'RGQ': int32 'SB': array ---------------------------------------- Column key: ['s'] Row key: ['locus', 'alleles'] ---------------------------------------- [Stage 20:===============================> (295 + 32) / 500] [Stage 20:========================================> (384 + 33) / 500] [Stage 20:=====================================================>(495 + 5) / 500]2021-06-22 19:01:54 Hail: INFO: Coerced sorted dataset 2021-06-22 19:01:55 Hail: INFO: Coerced sorted dataset [Stage 22:======================> (214 + 33) / 500] [Stage 22:==================================> (321 + 32) / 500] [Stage 22:============================================> (417 + 32) / 500]2021-06-22 19:01:56 Hail: INFO: Coerced sorted dataset [Stage 24:> (0 + 1) / 1]2021-06-22 19:04:19 Hail: INFO: Ordering unsorted dataset with network shuffle [Stage 25:> (0 + 1) / 1] [Stage 26:> (0 + 0) / 2] [Stage 26:> (0 + 2) / 2] [Stage 26:=============================> (1 + 1) / 2]2021-06-22 19:06:51 Hail: INFO: wrote matrix table with 2 rows and 47 columns in 2 partitions to s3://s3_bucket/mt-hail-luigi/test/batch109_subset.mt INFO: [pid 27699] Worker Worker(salt=708895954, workers=1, host=ip-172-21-93-0, username=hadoop, pid=27699) done SeqrVCFToMTTask(source_paths=["s3://s3_bucket/vcf/batch109_subset.vcf"], dest_path=s3://s3_bucket/mt-hail-luigi/test/batch109_subset.mt, genome_version=38, array_elements_required=False, vep_runner=VEP, reference_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/all_reference_data/combined_reference_data_grch38.ht, clinvar_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/CLINVAR/clinvar.GRCh38.ht, hgmd_like_csv_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD_LIKE/GRCh38_HGMD_2020_03_v2.csv, hgmd_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD/hgmd_hg38.ht, cidr_ht_path=None, nisc_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/NISC.ht, bgi_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/BGI.ht, hgsc_wes_ht_path=None, hgsc_wgs_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGSC_WGS.ht, sample_type=WES, validate=False, dataset_type=VARIANTS, remap_path=, subset_path=) INFO:luigi-interface:[pid 27699] Worker Worker(salt=708895954, workers=1, host=ip-172-21-93-0, username=hadoop, pid=27699) done SeqrVCFToMTTask(source_paths=["s3://s3_bucket/vcf/batch109_subset.vcf"], dest_path=s3://s3_bucket/mt-hail-luigi/test/batch109_subset.mt, genome_version=38, array_elements_required=False, vep_runner=VEP, reference_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/all_reference_data/combined_reference_data_grch38.ht, clinvar_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/CLINVAR/clinvar.GRCh38.ht, hgmd_like_csv_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD_LIKE/GRCh38_HGMD_2020_03_v2.csv, hgmd_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD/hgmd_hg38.ht, cidr_ht_path=None, nisc_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/NISC.ht, bgi_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/BGI.ht, hgsc_wes_ht_path=None, hgsc_wgs_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGSC_WGS.ht, sample_type=WES, validate=False, dataset_type=VARIANTS, remap_path=, subset_path=) DEBUG: 1 running tasks, waiting for next task to finish DEBUG:luigi-interface:1 running tasks, waiting for next task to finish INFO: Informed scheduler that task SeqrVCFToMTTask_False_s3___seqr_dp_dat_None_9addc52e85 has status DONE INFO:luigi-interface:Informed scheduler that task SeqrVCFToMTTask_False_s3___seqr_dp_dat_None_9addc52e85 has status DONE