- why is this having trouble writing to the file?
It’s important to note that almost every operation in Hail is lazy – that is, everything upstream is executed when you are executing the write
.
- is there a better way to do the 1:many calculation for the na12878 comparisons.
I think the iterative union_cols is quadratic in the number of samples – see this thread involving the same issue:
The issue is that Hail doesn’t have great ways right now to load a batch of single-sample VCFs into a single joint MatrixTable. We’re working on a gVCF merging algorithm, which we expect to come online in the next six months.
For the time being, things may work if you use my N * log(N) solution in the linked comment.