FileNotFoundException when I tried to export after I upgraded hail

takeo-naito · June 26, 2020, 12:14am

Hello Hail Team,

I would like to ask for your help.
I am using hail 0.2 locally on my institute’s single multi CPU workstation (48 CPUs).
After I updated hail (version 0.2.46 using pip install hail -U), my code which originally had worked in previous version (version 0.2.33) doesn’t work.
The code which doesn’t work is

myMatrixtable.make_table().export(‘./output_name.txt’)

This gave me an error as below

Error summary: FileNotFoundException: /mnt/share6/FOR_Takeo/WES/all_vari_annotation_MSC.txt (Input/output error)

The exact same code worked in the version 0.2.33. So I am confused.
I searched similar problem in this forum but I couldn’t find my answer.
Any advice would be greatly appreciated.

kumarveerapen · June 26, 2020, 12:18am

Would you mind sharing more of your code e.g. how was myMatrixtable formed? Does the file seem okay after loading it e.g. show() functions show what the file has and what you expect?
Perhaps we could work from there?

takeo-naito · June 26, 2020, 12:24am

Thank you for your quick response!
Sure,basically, I tried to extract 323 variants GT information from whole matrix table.

ht = hl.Table.from_pandas(table4)

ht1 = ht.annotate(start = hl.int32(ht.Start))

ht2 = ht1.key_by(locus = hl.locus(ht1.Chr,ht1.start), alleles = [ht1.Ref, ht1.Alt])

ht2.count()

mtfinal = mtanot3.semi_join_rows(ht2)

mtfinal1 = mtfinal.annotate_rows(flip = hl.cond(mtfinal.variant_qc.AF[1] > 0.5, True,False))

mtfinal3 = mtfinal1.select_entries(GT = mtfinal1.GT)

mtfinal4 = mtfinal3.select_rows(mtfinal3.annovar[‘Gene.refGene’],mtfinal3.annovar.CADD_Score,
mtfinal3.annovar.avsnp150,mtfinal3.annovar[‘ExonicFunc.knownGene’],mtfinal3.annovar.gnomAD_exome_NFE,
mtfinal3.annovar.gnomAD_genome_NFE,mtfinal3.flip,mtfinal3.annovar[‘GERP++_RS’],
mtfinal3.annovar[‘GERP++_RS_rankscore’],mtfinal3.cont_hwe,mtfinal3.clinsig,
mtfinal3.homo_in_veoibd,mtfinal3.AF_in_veoibd,mtfinal3.AC_in_veoibd,
mtfinal3.annovar[‘AAChange.refGene’])

mtfinal4.make_table().export(‘./cand_gene_cadd_riskpatients.txt’)

mtfinal4 is myMatrixtable.

takeo-naito · June 26, 2020, 12:33am

And attached is the entire error message.
error_message.txt (49.4 KB)

kumarveerapen · June 26, 2020, 12:34am

Thanks, Takeo! I’ve tagged a few team mates on our office IM about getting back to you on this ASAP!

takeo-naito · June 26, 2020, 12:37am

Thank you for your help, @kumarveerapen.
I am looking forward to your feedback!

kumarveerapen · June 29, 2020, 6:04pm

Apologies for getting back to you this late.

Where does mtanot3 come from?
What is the text file named all_vari_annotation_MSC.txt and where did that come from?

We would also recommend that they use one name mt for every table. Switching around to different names is not recommended practice because this may likely be why you are getting the error?

takeo-naito · June 30, 2020, 3:36am

Thank you for your reply, @kumarveerapen.
Actually, I added some annotation to original mt using annotate_rows function.
The all_vari_annotation_MSC.txt is just a text file which contains various annotation information for each variant.
I tried keeping name of matrix tables same (mt), but i got same error.

tpoterba · June 30, 2020, 12:51pm

Could you post the full pipeline, including the piece that reads all_vari_annotation_MSC.txt?

takeo-naito · July 8, 2020, 10:00pm

Hello Tim,

I am sorry for my late response.
I modified the part below.

ht = hl.Table.from_pandas(table4)

I exported the table of pandas (table4) as csv and import the csv using hl.import_table. Then my code worked without any problem.

I have attached my full code.

read matrix table

mt = hl.read_matrix_table(‘/mnt/share6/FOR_Takeo/WES/hailMT_for_all/cleaned_cont_hwe.mt/’)

read annotation file

ht = hl.import_table(‘/mnt/share6/FOR_Takeo/WES/all_vari_annotation_MSC.txt’)

ht1 = ht.key_by(locus = hl.locus(ht.Chr,hl.int(ht.Start)), alleles = [ht.Ref, ht.Alt])

add annotation file to mt

mt1 = mt.annotate_rows(annovar = ht1[mt.row_key])

mtanot = hl.variant_qc(mt1)

read annotation other files

clinvar = hl.import_vcf(‘/mnt/share6/FOR_Takeo/WES/CLINVAR/clinvar_20200316.vcf.gz’,skip_invalid_loci=True,force_bgz = True)

clinvar2 = hl.import_vcf(‘/mnt/share6/FOR_Takeo/WES/CLINVAR/clinvar_20200316_papu.vcf.gz’,skip_invalid_loci=True,force_bgz = True)

clinmerge = clinvar.union_rows(clinvar2)

add annotation file to mt

mtanot2 = mtanot.annotate_rows(clinsig = clinmerge.index_rows(mtanot.row_key).info.CLNSIG)

mtanot2 = mtanot2.annotate_rows(Star = clinmerge.index_rows(mtanot2.row_key).info.CLNREVSTAT)

read variants information

table1 = pd.read_table(“/mnt/share6/FOR_Takeo/WES/annovar_variant/allsub_anovar_anotation_rev.txt”,
dtype = {‘Chr’: ‘str’})

do some filtering

table1 = table1[(table1.gnomAD_exome_NFE <= 0.01) | (table1.gnomAD_exome_NFE >= 0.99) |
(table1.gnomAD_exome_NFE.isnull() & (table1.AF1 <= 0.005)) |
(table1.gnomAD_exome_NFE.isnull() & (table1.AF2 <= 0.005))]

table2 = table1[[‘Chr’,‘Start’,‘Ref’,‘Alt’]]

if I used this code, I couldn’t export.

ht_v = hl.Table.from_pandas(table2)

if I used this code, I could export.

table1[[‘Chr’,‘Start’,‘Ref’,‘Alt’]].to_csv(‘/mnt/share6/FOR_Takeo/temporary/candidate_variants.txt’,sep = ‘\t’,index = False)
ht_v = hl.import_table(‘/mnt/share6/FOR_Takeo/temporary/candidate_variants.txt’)

ht_v1 = ht_v.annotate(start = hl.int32(ht_v.Start))

ht_v2 = ht_v1.key_by(locus = hl.locus(ht_v1.Chr,ht_v1.start), alleles = [ht_v1.Ref, ht_v1.Alt])

mtfinal = mtanot3.semi_join_rows(ht_v2)

mtfinal1 = mtfinal.annotate_rows(flip = hl.cond(mtfinal.variant_qc.AF[1] > 0.5, True,False))

mtfinal3 = mtfinal1.select_entries(GT = mtfinal1.GT)

mtfinal4 = mtfinal3.select_rows(mtfinal3.annovar[‘Gene.refGene’],mtfinal3.annovar.CADD_Score,
mtfinal3.annovar.avsnp150,mtfinal3.annovar[‘ExonicFunc.knownGene’],mtfinal3.annovar.gnomAD_exome_NFE,
mtfinal3.annovar.gnomAD_genome_NFE,mtfinal3.flip,mtfinal3.annovar[‘GERP++_RS’],
mtfinal3.annovar[‘GERP++_RS_rankscore’],mtfinal3.cont_hwe,mtfinal3.clinsig,
mtfinal3.homo_in_veoibd,mtfinal3.AF_in_veoibd,mtfinal3.AC_in_veoibd,
mtfinal3.annovar[‘AAChange.refGene’])

mtfinal4.make_table().export(‘./ACE2/cand_gene_cadd_riskpatients.txt’)

Thank you for your help.

Topic		Replies	Views
Export VCF taking a long time, even when running in parallel Hail Query & hailctl	3	455	December 5, 2023
Looking for a function to export a matrix table to tsv like export_samples() in hail0.1 Hail Query & hailctl	8	529	July 30, 2020
Hail Exception crash during export step - how to diagnose Hail Query & hailctl	4	1009	June 3, 2019
Fail to retrieve row information of Hail matrix.table Hail Query & hailctl	5	523	July 22, 2022
AssertionError exporting Table to txt file Hail Query & hailctl	3	266	August 18, 2023

FileNotFoundException when I tried to export after I upgraded hail

read matrix table

read annotation file

add annotation file to mt

read annotation other files

add annotation file to mt

read variants information

do some filtering

if I used this code, I couldn’t export.

if I used this code, I could export.

Related topics