You’re right that the first thread is overly strong.
You should not use data directly from an import unless the source format is good. At the time of writing there are two “good” formats: Hail Matrix Table and partitioned BGEN.
BGEN is binary and compact: it stores each genotype in, I think, two bytes. Writing that to a Hail Matrix Table would store the genotype in, I think, 4 bytes. That doubles your data size. In contrast, parsing a VCF (which is a text file) is an expensive process. Doing that repeatedly is wasteful.