Hello! I am trying to compare two datasets with exactly the same samples. I have used the code:
summary, samples, variants = hl.concordance(dataset, dataset2)
However, I am having some trouble understanding the output. I am trying to extract the variants that are unique to each dataset, can I get some advice on how to do this?
The output of concordance
is really most useful for interrogating genotype concordance, rather than variant concordance. It’s easy to use other table-level methods to query the variants unique to each table.
# only need variant information, not genotypes
ds1 = dataset.rows()
ds2 = dataset2.rows()
# unique in dataset1 -- filter out any variants in ds2
ds1_unique = ds1.anti_join_rows(ds2)
# unique in dataset2 -- filter out any variants in ds1
ds2_unique = ds2.anti_join_rows(ds1)
1 Like
Thank you! I seem to be getting this error unfortunately:
AttributeError: Table instance has no field, method, or property 'anti_join_rows'
Did you mean:
Table method: 'anti_join'
I will try “anti-join”
oh, oops, that’s right.