I’m playing around with a large matrixTable and trying to export it to an Apache Spark dataframe for further processing.
Trying to do something like this:
df = mt.entries().to_spark()
I’ve seen the comment on MatrixTable.entries() about the size explosion though. It’s not entirely clear to me why that is the case.
Can someone please shed light on why the data in this case is so large (as compared for example to the original data used to build the MatrixTable)?