I thought those are simple concepts, but I can’t find anywhere except the VDS documentation. And also how they are different from VDS file? Thanks
Hey @SimonLi5601 !
You’re right that the native file formats are not well documented. The bulk of our users are uninterested in these details. I think you’re the third person to ask about their details in the eight years of the project.
I’ll directly answer your question below, but I’m curious to better understand your motivations and goals. That will help me give you a more useful answer.
An “.mt” “file” is a compressed, partitioned, indexed, binary format for Matrix Tables (which is like a pandas DataFrame with an extra index dimension). An “.ht” “file” is a compressed, partitioned, indexed, binary format for Tables (which is effectively an out-of-core pandas DataFrame).
To be a bit more specific, an “.mt” file is a little bit of metadata plus four “.ht” files (though they lack the extension):
globals
cols
rows
entries
and two optional pieces:
index
references
An “.ht” file is a little bit of metadata plus:
rows
(this contains the partitioned data)
and two optional pieces:
index
references