Hi, I was wondering if it’s possible to merge two vcf files (or resulting mts) with different samples but overlapping variants, such that any downstream QC takes into account information from the combined data set. I am working locally on a computer.
By “overlapping”, do you mean identical? If so,
union_cols will work out-of-the-box. The default row join type is an inner join, which will restrict to variants shared by both datasets.
ahh yes that’s exactly the case. thank you!
is there a solution for the alternative?
You can use
mt.union_cols(mt1, join_type='outer') but the resulting matrix table will have missing entries that may require some special handling in downstream analysis.
thank you. would hail flag these scenarios as they occurred or would you have to preempt them and try to fix it beforehand?