FileNotFoundException when I tried to export after I upgraded hail

Hello Hail Team,

I would like to ask for your help.
I am using hail 0.2 locally on my institute’s single multi CPU workstation (48 CPUs).
After I updated hail (version 0.2.46 using pip install hail -U), my code which originally had worked in previous version (version 0.2.33) doesn’t work.
The code which doesn’t work is

myMatrixtable.make_table().export(‘./output_name.txt’)

This gave me an error as below

Error summary: FileNotFoundException: /mnt/share6/FOR_Takeo/WES/all_vari_annotation_MSC.txt (Input/output error)

The exact same code worked in the version 0.2.33. So I am confused.
I searched similar problem in this forum but I couldn’t find my answer.
Any advice would be greatly appreciated.

Would you mind sharing more of your code e.g. how was myMatrixtable formed? Does the file seem okay after loading it e.g. show() functions show what the file has and what you expect?
Perhaps we could work from there?

Thank you for your quick response!
Sure,basically, I tried to extract 323 variants GT information from whole matrix table.

ht = hl.Table.from_pandas(table4)

ht1 = ht.annotate(start = hl.int32(ht.Start))

ht2 = ht1.key_by(locus = hl.locus(ht1.Chr,ht1.start), alleles = [ht1.Ref, ht1.Alt])

ht2.count()

mtfinal = mtanot3.semi_join_rows(ht2)

mtfinal1 = mtfinal.annotate_rows(flip = hl.cond(mtfinal.variant_qc.AF[1] > 0.5, True,False))

mtfinal3 = mtfinal1.select_entries(GT = mtfinal1.GT)

mtfinal4 = mtfinal3.select_rows(mtfinal3.annovar[‘Gene.refGene’],mtfinal3.annovar.CADD_Score,
mtfinal3.annovar.avsnp150,mtfinal3.annovar[‘ExonicFunc.knownGene’],mtfinal3.annovar.gnomAD_exome_NFE,
mtfinal3.annovar.gnomAD_genome_NFE,mtfinal3.flip,mtfinal3.annovar[‘GERP++_RS’],
mtfinal3.annovar[‘GERP++_RS_rankscore’],mtfinal3.cont_hwe,mtfinal3.clinsig,
mtfinal3.homo_in_veoibd,mtfinal3.AF_in_veoibd,mtfinal3.AC_in_veoibd,
mtfinal3.annovar[‘AAChange.refGene’])

mtfinal4.make_table().export(‘./cand_gene_cadd_riskpatients.txt’)

mtfinal4 is myMatrixtable.

And attached is the entire error message.
error_message.txt (49.4 KB)

Thanks, Takeo! I’ve tagged a few team mates on our office IM about getting back to you on this ASAP!

Thank you for your help, @kumarveerapen.
I am looking forward to your feedback!

Apologies for getting back to you this late.

Where does mtanot3 come from?
What is the text file named all_vari_annotation_MSC.txt and where did that come from?

We would also recommend that they use one name mt for every table. Switching around to different names is not recommended practice because this may likely be why you are getting the error?

Thank you for your reply, @kumarveerapen.
Actually, I added some annotation to original mt using annotate_rows function.
The all_vari_annotation_MSC.txt is just a text file which contains various annotation information for each variant.
I tried keeping name of matrix tables same (mt), but i got same error.

Could you post the full pipeline, including the piece that reads all_vari_annotation_MSC.txt?

Hello Tim,

I am sorry for my late response.
I modified the part below.

ht = hl.Table.from_pandas(table4)

I exported the table of pandas (table4) as csv and import the csv using hl.import_table. Then my code worked without any problem.

I have attached my full code.

read matrix table

mt = hl.read_matrix_table(‘/mnt/share6/FOR_Takeo/WES/hailMT_for_all/cleaned_cont_hwe.mt/’)

read annotation file

ht = hl.import_table(‘/mnt/share6/FOR_Takeo/WES/all_vari_annotation_MSC.txt’)

ht1 = ht.key_by(locus = hl.locus(ht.Chr,hl.int(ht.Start)), alleles = [ht.Ref, ht.Alt])

add annotation file to mt

mt1 = mt.annotate_rows(annovar = ht1[mt.row_key])

mtanot = hl.variant_qc(mt1)

read annotation other files

clinvar = hl.import_vcf(‘/mnt/share6/FOR_Takeo/WES/CLINVAR/clinvar_20200316.vcf.gz’,skip_invalid_loci=True,force_bgz = True)

clinvar2 = hl.import_vcf(‘/mnt/share6/FOR_Takeo/WES/CLINVAR/clinvar_20200316_papu.vcf.gz’,skip_invalid_loci=True,force_bgz = True)

clinmerge = clinvar.union_rows(clinvar2)

add annotation file to mt

mtanot2 = mtanot.annotate_rows(clinsig = clinmerge.index_rows(mtanot.row_key).info.CLNSIG)

mtanot2 = mtanot2.annotate_rows(Star = clinmerge.index_rows(mtanot2.row_key).info.CLNREVSTAT)

read variants information

table1 = pd.read_table(“/mnt/share6/FOR_Takeo/WES/annovar_variant/allsub_anovar_anotation_rev.txt”,
dtype = {‘Chr’: ‘str’})

do some filtering

table1 = table1[(table1.gnomAD_exome_NFE <= 0.01) | (table1.gnomAD_exome_NFE >= 0.99) |
(table1.gnomAD_exome_NFE.isnull() & (table1.AF1 <= 0.005)) |
(table1.gnomAD_exome_NFE.isnull() & (table1.AF2 <= 0.005))]

table2 = table1[[‘Chr’,‘Start’,‘Ref’,‘Alt’]]

if I used this code, I couldn’t export.

ht_v = hl.Table.from_pandas(table2)

if I used this code, I could export.

table1[[‘Chr’,‘Start’,‘Ref’,‘Alt’]].to_csv(‘/mnt/share6/FOR_Takeo/temporary/candidate_variants.txt’,sep = ‘\t’,index = False)
ht_v = hl.import_table(‘/mnt/share6/FOR_Takeo/temporary/candidate_variants.txt’)

ht_v1 = ht_v.annotate(start = hl.int32(ht_v.Start))

ht_v2 = ht_v1.key_by(locus = hl.locus(ht_v1.Chr,ht_v1.start), alleles = [ht_v1.Ref, ht_v1.Alt])

mtfinal = mtanot3.semi_join_rows(ht_v2)

mtfinal1 = mtfinal.annotate_rows(flip = hl.cond(mtfinal.variant_qc.AF[1] > 0.5, True,False))

mtfinal3 = mtfinal1.select_entries(GT = mtfinal1.GT)

mtfinal4 = mtfinal3.select_rows(mtfinal3.annovar[‘Gene.refGene’],mtfinal3.annovar.CADD_Score,
mtfinal3.annovar.avsnp150,mtfinal3.annovar[‘ExonicFunc.knownGene’],mtfinal3.annovar.gnomAD_exome_NFE,
mtfinal3.annovar.gnomAD_genome_NFE,mtfinal3.flip,mtfinal3.annovar[‘GERP++_RS’],
mtfinal3.annovar[‘GERP++_RS_rankscore’],mtfinal3.cont_hwe,mtfinal3.clinsig,
mtfinal3.homo_in_veoibd,mtfinal3.AF_in_veoibd,mtfinal3.AC_in_veoibd,
mtfinal3.annovar[‘AAChange.refGene’])

mtfinal4.make_table().export(‘./ACE2/cand_gene_cadd_riskpatients.txt’)

Thank you for your help.