Matrix table - custom entry field

I’ve been using the import_matrix_table to start work with some large .tsv files.

However, I am finding the datatypes allowed for the entry-field to be a bit limiting for my application. Is there any way to extend the allow entry-field to array or even a general struct?

Thanks for your help!

What do the entries look like? Are they JSON? If so, you can import the entry field as a string and use hl.parse_json to convert to any other type.

For the time being an array would be the most helpful entry type, but the ability to have a more general object-like set of fields as is the case when importing VCFs would be the most useful.

Storing the data as string and converting using the json_parser on the the fly seems to come with a lot of overhead and so may not be a practical approach.

Thanks for the help so far!

I think the two are going to be roughly equivalent. Parsing text is pretty slow.

I agree, maybe I misread your previous suggestion, but I understood that you were suggesting storing the entry object as a json-string and parsing upon each retrieval. Eitherway, this doesnt seem practical to me.

As a first pass, is there any way to store an array of int32 in the entry-field?

Thanks!

Perhaps you can say a little more about the application? I’m generally assuming that this text matrix is an interchange format, and performance on import isn’t quite as much of a concern as downstream processing.

Do you mean in the entry field of the text matrix? Sure, you could use comma-delimited integers for instance, and do something like the following after importing as a string:

mt = hl.import_matrix_table(...)
mt = mt.select_entries(int_array = mt.comma_delimited_ints.split(',').map(lambda x: hl.int32(x))