Issue in exporting VEP annotated entry/row tables

Context: I am annotating a matrix table with VEP after performing some filters on it. I then write the matrix table to a database. This part works successfully, I have tried reading in the matrix table after exporting it and viewing it’s entry/row/column tables and it works without any errors or issues. Then, I perform some filters regarding the allele frequency, and then I store the matrix row table and the matrix entry table in 2 different variables. When I view these tables, i.e, table1.show() and table2.show(), it works without any issues. After this, I export both tables to the database or to disk (I have tried both, the issue persists in both) in .tsv.bgz format. Then, I reread those tables (for checking), and when I run the commands table1.show() and table2.show(), it throws an error.

Error: The error says “expected 228 fields, but found 235 fields”.

I have attached screenshots to help with context.

Here is am viewing the rows table

Here I am exporting the rows table

Here I am importing from the same path where I exported the table to

Here is the full error log

Any help will be gladly appreciated.

It seems quite likely that you’re exporting a value that contains a tab character. If you filter to chr20:56369799 and look at the fields, does one of them have a tab character? This is arguably a Hail bug: we should use some quoting scheme when exporting. I’m not sure there is a standard way to escape characters in TSVs though.

In the meantime, you can quote the characters yourself for any string field:

mt = mt.annotate_rows(foo = foo.replace('\t', '\\t'))

I just checked and, assuming you have no double quotes, you could also add double quotes to the start and end of every string and import with double quotes:

mt = mt.annotate_rows(foo = '"' + mt.foo + '"')
mt.export('foo.tsv')
hl.import_table('foo.tsv', quote='"')