Hi Dan King,
Thank you for your insightful presentation today!
Per our conversation, I’d appreciate it if you could take a moment to assess the code below for computational equivalence. Our goal here is to extract biallelic variants with AC >= 100
I expect bypassing densifying entire vds will significantly reduce computational time.
While my initial tests suggest consistent results, I want to ensure its reliability across various scenarios. Your expertise in this review would be highly appreciated.
## Code1
vds = hl.vds.read_vds("some_very_big.vds")
mt = vds.variant_data
mt = hl.split_multi_hts(mt)
mt = hl.variant_qc(mt)
mt = mt.annotate_rows(AC100 = mt.variant_qc.AC[1] > 99)
mt = mt.filter_rows(mt.AC100)
vds.variant_data = mt
mt = hl.vds.to_dense_mt(vds)
mt.write("filtered.mt", overwrite=True)
## Code2
vds = hl.vds.read_vds("some_very_big.vds")
mt = hl.vds.to_dense_mt(vds)
mt = hl.split_multi_hts(mt)
mt = hl.variant_qc(mt)
mt = mt.annotate_rows(AC100 = mt.variant_qc.AC[1] > 99)
mt = mt.filter_rows(mt.AC100)
mt.write("filtered.mt", overwrite=True)