I have a fairly large vcf (430 MB compressed), and it contains unsorted variants. It was produced as part of a WGS analysis pipeline that I’m only vaguely familiar with. Not sure if this is relevant but it also contains many alternate contig ID’s such as:
So far, I’ve successfully imported this vcf into a matrix table object, but I cant seem to run any useful queries on it without crashing the memory. For example, when I try to display the first 5 lines [mt.show(5)], it gives me a long java stack trace error that ends with:
Error summary: IOException: No space left on device
I eventually want to annotate this vcf with gnomAD popmax values, but my inability to even display the first 5 rows is not inspiring confidence. Can anyone give me some advice on how to handle this table without overloading my memory?