Hello, what type of compression does the index file have? (matrix_table/index/part-0/index)
Sincerely, Roshchin Philipp
Hello, what type of compression does the index file have? (matrix_table/index/part-0/index)
Sincerely, Roshchin Philipp
i need to parse data in index file by Python
without Pandas and Spark, only Python
Thank you!
import chardet
part_path = "/data/table.ht/rows/parts/part-00-beab4b89-aff6-43de-b358-feed3f1c07a3"
with open(part_path, "rb") as file:
content = file.read()
detected_encoding = chardet.detect(content)['encoding']
detected_encoding
# 'ISO-8859-1'
decoded_content = content.decode("ISO-8859-1")
decoded_content
# '\x0e\x00\x00\x00\x01\x00\x00\x00(µ/ý \x01\t\x00\x00\x00\r\x00\x00\x00\x00\x00\x00\x00(µ/ý \x00\x01\x00\x00'
I need to understand how to decode this data
Hi @PHILIPP111007,
The index file is a binary file, and is part of our internal format which we don’t document, and can’t promise won’t change in the future. Can I ask why you need to decode it in python?