Welcome to __ __ <>__ / /_/ /__ __/ / / __ / _ `/ / / /_/ /_/\_,_/_/_/ version 0.2.70-5bb98953a4a7 LOGGING: writing to /mnt/var/lib/hadoop/steps/s-3S1NF8TLY5EON/hail-20210628-1954-0.2.70-5bb98953a4a7.log [Stage 0:> (0 + 8) / 500] [Stage 0:> (3 + 8) / 500] [Stage 0:=> (13 + 8) / 500] [Stage 0:==> (26 + 8) / 500] [Stage 0:====> (37 + 8) / 500] [Stage 0:=====> (53 + 8) / 500] [Stage 0:=======> (69 + 8) / 500] [Stage 0:=========> (86 + 8) / 500] [Stage 0:===========> (102 + 8) / 500] [Stage 0:============> (118 + 8) / 500] [Stage 0:==============> (134 + 8) / 500] [Stage 0:================> (152 + 8) / 500] [Stage 0:==================> (170 + 8) / 500] [Stage 0:====================> (188 + 8) / 500] [Stage 0:=======================> (210 + 8) / 500] [Stage 0:========================> (225 + 8) / 500] [Stage 0:===========================> (246 + 8) / 500] [Stage 0:=============================> (265 + 8) / 500] [Stage 0:===============================> (286 + 8) / 500] [Stage 0:=================================> (306 + 8) / 500] [Stage 0:===================================> (320 + 8) / 500] [Stage 0:=====================================> (337 + 8) / 500] [Stage 0:=======================================> (357 + 8) / 500] [Stage 0:=========================================> (380 + 8) / 500] [Stage 0:============================================> (401 + 8) / 500] [Stage 0:=============================================> (417 + 8) / 500] [Stage 0:================================================> (437 + 8) / 500] [Stage 0:=================================================> (447 + 7) / 500] [Stage 0:===================================================> (471 + 8) / 500] [Stage 0:=====================================================> (490 + 8) / 500] 2021-06-28 19:55:41 Hail: INFO: Coerced sorted dataset 2021-06-28 19:55:45 Hail: INFO: Coerced sorted dataset [Stage 2:======> (56 + 8) / 500] [Stage 2:========> (77 + 8) / 500] [Stage 2:===========> (101 + 8) / 500] [Stage 2:=============> (126 + 8) / 500] [Stage 2:================> (150 + 8) / 500] [Stage 2:==================> (171 + 9) / 500] [Stage 2:=====================> (194 + 8) / 500] [Stage 2:========================> (219 + 8) / 500] [Stage 2:==========================> (245 + 8) / 500] [Stage 2:=============================> (266 + 8) / 500] [Stage 2:================================> (291 + 8) / 500] [Stage 2:==================================> (317 + 8) / 500] [Stage 2:=====================================> (343 + 8) / 500] [Stage 2:=======================================> (361 + 8) / 500] [Stage 2:==========================================> (386 + 8) / 500] [Stage 2:=============================================> (411 + 8) / 500] [Stage 2:================================================> (440 + 8) / 500] [Stage 2:===================================================> (467 + 8) / 500] [Stage 2:======================================================>(492 + 8) / 500] 2021-06-28 19:55:49 Hail: INFO: Coerced sorted dataset INFO:root:==> Done with VEP 2021-06-28 19:55:55 Hail: INFO: Reading table without type imputation Loading field '#CHROM' as type str (not specified) Loading field 'POSITION' as type str (not specified) Loading field 'HGMD_ID' as type str (not specified) WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has vep annotation. WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has filters annotation. INFO:luigi_pipeline.lib.model.base_mt_schema:Overwriting matrix table annotation filters WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has rsid annotation. INFO:luigi_pipeline.lib.model.base_mt_schema:Overwriting matrix table annotation rsid WARNING:luigi_pipeline.lib.model.base_mt_schema:MT using schema class already has vep annotation. INFO:luigi_pipeline.lib.model.base_mt_schema:Overwriting matrix table annotation vep ---------------------------------------- Global fields: 'gencodeVersion': str 'sourceFilePath': str 'genomeVersion': str 'sampleType': str 'hail_version': str ---------------------------------------- Column fields: 's': str ---------------------------------------- Row fields: 'locus': locus 'alleles': array 'aIndex': int32 'AC': int32 'AF': float64 'alt': str 'AN': int32 'bgi': struct { AC: int32, AF: float64, AN: int32 } 'cadd': struct { PHRED: float32 } 'cidr': struct { AC: int32, AF: float64, AN: int32 } 'clinvar': struct { allele_id: int32, clinical_significance: str, gold_stars: int32 } 'codingGeneIds': set 'contig': str 'dbnsfp': struct { SIFT_pred: str, Polyphen2_HVAR_pred: str, MutationTaster_pred: str, FATHMM_pred: str, MetaSVM_pred: str, REVEL_score: str, GERP_RS: str, phastCons100way_vertebrate: str } 'docId': str 'domains': set 'eigen': struct { Eigen_phred: float64 } 'end': int32 'exac': struct { AF_POPMAX: float64, AF: float64, AC_Adj: int32, AC_Het: int32, AC_Hom: int32, AC_Hemi: int32, AN_Adj: int32 } 'filters': set 'g1k': struct { AC: int32, AF: float64, AN: int32, POPMAX_AF: float64 } 'geneIds': set 'geno2mp': struct { HPO_Count: int32 } 'genotypes': array 'gnomad_exome_coverage': float64 'gnomad_exomes': struct { AF: float64, AN: int32, AC: int32, FAF_AF: float64, AF_POPMAX_OR_GLOBAL: float64, Hom: int32, Hemi: int32 } 'gnomad_genome_coverage': float64 'gnomad_genomes': struct { AF: float64, AN: int32, AC: int32, FAF_AF: float64, AF_POPMAX_OR_GLOBAL: float64, Hom: int32, Hemi: int32 } 'hgmd': struct { accession: str, class: str } 'hgmd_like': array 'hgsc_wes': struct { AC: int32, AF: float64, AN: int32 } 'hgsc_wgs': struct { AC: int32, AF: float64, AN: int32 } 'mainTranscript': struct { biotype: str, canonical: int32, category: str, cdna_start: int32, cdna_end: int32, codons: str, gene_id: str, gene_symbol: str, hgvs: str, hgvsc: str, major_consequence: str, major_consequence_rank: int32, transcript_id: str, amino_acids: str, domains: str, hgvsp: str, lof: str, lof_flags: str, lof_filter: str, lof_info: str, polyphen_prediction: str, protein_id: str, sift_prediction: str } 'mpc': struct { MPC: str } 'nisc': struct { AC: int32, AF: float64, AN: int32 } 'originalAltAlleles': array 'pos': int32 'primate_ai': struct { score: float64 } 'ref': str 'rsid': str 'samples_ab': struct { 0_to_5: set, 5_to_10: set, 10_to_15: set, 15_to_20: set, 20_to_25: set, 25_to_30: set, 30_to_35: set, 35_to_40: set, 40_to_45: set } 'samples_gq': struct { 0_to_5: set, 5_to_10: set, 10_to_15: set, 15_to_20: set, 20_to_25: set, 25_to_30: set, 30_to_35: set, 35_to_40: set, 40_to_45: set, 45_to_50: set, 50_to_55: set, 55_to_60: set, 60_to_65: set, 65_to_70: set, 70_to_75: set, 75_to_80: set, 80_to_85: set, 85_to_90: set, 90_to_95: set } 'samples_no_call': set 'samples_num_alt': struct { 1: set, 2: set } 'sortedTranscriptConsequences': array, domains: array, major_consequence: str, category: str, hgvs: str, major_consequence_rank: int32, transcript_rank: int32 }> 'splice_ai': struct { delta_score: float64 } 'start': int32 'topmed': struct { AC: int32, AF: float64, AN: int32, Hom: int32, Het: int32 } 'transcriptConsequenceTerms': set 'transcriptIds': set 'utrVariantAnnotation': array 'variantId': str 'vep': struct { assembly_name: str, allele_string: str, ancestral: str, colocated_variants: array, end: int32, eas_allele: str, eas_maf: float64, ea_allele: str, ea_maf: float64, eur_allele: str, eur_maf: float64, exac_adj_allele: str, exac_adj_maf: float64, exac_allele: str, exac_afr_allele: str, exac_afr_maf: float64, exac_amr_allele: str, exac_amr_maf: float64, exac_eas_allele: str, exac_eas_maf: float64, exac_fin_allele: str, exac_fin_maf: float64, exac_maf: float64, exac_nfe_allele: str, exac_nfe_maf: float64, exac_oth_allele: str, exac_oth_maf: float64, exac_sas_allele: str, exac_sas_maf: float64, id: str, minor_allele: str, minor_allele_freq: float64, phenotype_or_disease: int32, pubmed: array, sas_allele: str, sas_maf: float64, somatic: int32, start: int32, strand: int32 }>, context: str, end: int32, id: str, input: str, intergenic_consequences: array, impact: str, minimised: int32, variant_allele: str }>, most_severe_consequence: str, motif_feature_consequences: array, high_inf_pos: str, impact: str, minimised: int32, motif_feature_id: str, motif_name: str, motif_pos: int32, motif_score_change: float64, strand: int32, variant_allele: str }>, regulatory_feature_consequences: array, impact: str, minimised: int32, regulatory_feature_id: str, variant_allele: str }>, seq_region_name: str, start: int32, strand: int32, transcript_consequences: array, distance: int32, domains: array, exon: str, gene_id: str, gene_pheno: int32, gene_symbol: str, gene_symbol_source: str, hgnc_id: str, hgvsc: str, hgvsp: str, hgvs_offset: int32, impact: str, intron: str, lof: str, lof_flags: str, lof_filter: str, lof_info: str, minimised: int32, polyphen_prediction: str, polyphen_score: float64, protein_end: int32, protein_start: int32, protein_id: str, sift_prediction: str, sift_score: float64, strand: int32, swissprot: str, transcript_id: str, trembl: str, tsl: int32, uniparc: str, variant_allele: str }>, variant_class: str } 'xpos': int64 'xstart': int64 'xstop': int64 ---------------------------------------- Entry fields: 'AD': array 'DP': int32 'GQ': int32 'GT': call 'MIN_DP': int32 'PGT': call 'PID': str 'PL': array 'PP': array 'PS': int32 'RGQ': int32 'SB': array ---------------------------------------- Column key: ['s'] Row key: ['locus', 'alleles'] ---------------------------------------- [Stage 4:=====> (46 + 8) / 500] [Stage 4:=======> (65 + 8) / 500] [Stage 4:=========> (87 + 8) / 500] [Stage 4:============> (113 + 8) / 500] [Stage 4:===============> (140 + 8) / 500] [Stage 4:=================> (158 + 8) / 500] [Stage 4:====================> (182 + 8) / 500] [Stage 4:======================> (208 + 8) / 500] [Stage 4:=========================> (235 + 8) / 500] [Stage 4:============================> (255 + 8) / 500] [Stage 4:==============================> (279 + 8) / 500] [Stage 4:================================> (299 + 8) / 500] [Stage 4:===================================> (321 + 8) / 500] [Stage 4:======================================> (352 + 8) / 500] [Stage 4:=========================================> (376 + 8) / 500] [Stage 4:============================================> (404 + 9) / 500] [Stage 4:===============================================> (432 + 8) / 500] [Stage 4:===================================================> (464 + 8) / 500] [Stage 4:======================================================>(492 + 8) / 500] 2021-06-28 19:56:22 Hail: INFO: Coerced sorted dataset 2021-06-28 19:56:25 Hail: INFO: Coerced sorted dataset [Stage 6:=========> (82 + 8) / 500] [Stage 6:==========> (98 + 8) / 500] [Stage 6:=============> (122 + 8) / 500] [Stage 6:===============> (144 + 9) / 500] [Stage 6:===================> (177 + 8) / 500] [Stage 6:=======================> (210 + 8) / 500] [Stage 6:=========================> (235 + 9) / 500] [Stage 6:=============================> (265 + 8) / 500] [Stage 6:===============================> (290 + 8) / 500] [Stage 6:==================================> (316 + 8) / 500] [Stage 6:=====================================> (344 + 8) / 500] [Stage 6:========================================> (364 + 8) / 500] [Stage 6:==========================================> (386 + 8) / 500] [Stage 6:=============================================> (410 + 8) / 500] [Stage 6:===============================================> (430 + 8) / 500] [Stage 6:==================================================> (455 + 8) / 500] [Stage 6:=====================================================> (485 + 8) / 500] 2021-06-28 19:56:29 Hail: INFO: Coerced sorted dataset [Stage 8:> (0 + 0) / 1] [Stage 8:> (0 + 1) / 1] 2021-06-28 19:59:13 Hail: INFO: Ordering unsorted dataset with network shuffle [Stage 9:> (0 + 0) / 1] [Stage 9:> (0 + 1) / 1] [Stage 10:> (0 + 0) / 2] [Stage 10:> (0 + 2) / 2] 2021-06-28 20:02:38 Hail: INFO: wrote matrix table with 2 rows and 47 columns in 2 partitions to s3://s3_bucket/mt-hail-luigi/test/batch109_subset.mt Total size: 5.32 KiB * Rows/entries: 4.83 KiB * Columns: 422.00 B * Globals: 77.00 B * Smallest partition: 1 rows (1.94 KiB) * Largest partition: 1 rows (2.90 KiB) INFO: [pid 24639] Worker Worker(salt=031599679, workers=1, host=ip-172-21-81-44, username=hadoop, pid=24639) done SeqrVCFToMTTask(source_paths=["s3://s3_bucket/vcf/batch109_subset.vcf"], dest_path=s3://s3_bucket/mt-hail-luigi/test/batch109_subset.mt, genome_version=38, array_elements_required=False, vep_runner=VEP, reference_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/all_reference_data/combined_reference_data_grch38.ht, clinvar_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/CLINVAR/clinvar.GRCh38.ht, hgmd_like_csv_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD_LIKE/GRCh38_HGMD_2020_03_v2.csv, hgmd_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD/hgmd_hg38.ht, cidr_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/CIDR.ht, nisc_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/NISC.ht, bgi_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/BGI.ht, hgsc_wes_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGSC_WES.ht, hgsc_wgs_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGSC_WGS.ht, sample_type=WES, validate=False, dataset_type=VARIANTS, remap_path=, subset_path=) INFO:luigi-interface:[pid 24639] Worker Worker(salt=031599679, workers=1, host=ip-172-21-81-44, username=hadoop, pid=24639) done SeqrVCFToMTTask(source_paths=["s3://s3_bucket/vcf/batch109_subset.vcf"], dest_path=s3://s3_bucket/mt-hail-luigi/test/batch109_subset.mt, genome_version=38, array_elements_required=False, vep_runner=VEP, reference_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/all_reference_data/combined_reference_data_grch38.ht, clinvar_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/CLINVAR/clinvar.GRCh38.ht, hgmd_like_csv_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD_LIKE/GRCh38_HGMD_2020_03_v2.csv, hgmd_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGMD/hgmd_hg38.ht, cidr_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/CIDR.ht, nisc_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/NISC.ht, bgi_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/BGI.ht, hgsc_wes_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGSC_WES.ht, hgsc_wgs_ht_path=s3://s3_bucket/seqr-reference-data/GRCh38/HGSC_WGS.ht, sample_type=WES, validate=False, dataset_type=VARIANTS, remap_path=, subset_path=) DEBUG: 1 running tasks, waiting for next task to finish DEBUG:luigi-interface:1 running tasks, waiting for next task to finish INFO: Informed scheduler that task SeqrVCFToMTTask_False_s3___seqr_dp_dat_s3___seqr_dp_dat_16f2ef2d3e has status DONE INFO:luigi-interface:Informed scheduler that task SeqrVCFToMTTask_False_s3___seqr_dp_dat_s3___seqr_dp_dat_16f2ef2d3e has status DONE