Identify duplicated sample groups

How can I group duplicated samples after running pc_relate?

I am running pc_relate with the following command:

rel = hl.pc_relate(mt.GT,
        min_individual_maf = 0.01,
        k = 20,
        statistics = "kin",
        min_kinship = (1/(2**1.5)))

The table after running pc_relate looks like this:

i.s j.s Column 3 Column 4
Sample1_Rep1 Sample1_Rep2
Sample1_Rep2 Sample1_Rep3
Sample2_Rep1 Sample2_Rep2
Sample3_Rep1 Sample3_Rep2
Sample3_Rep2 Sample3_Rep3
Sample3_Rep3 Sample3_Rep4
Sample3_Rep2 Sample3_Rep3
Sample3_Rep2 Sample3_Rep4
Sample3_Rep3 Sample3_Rep4

Sample1 would have been sequenced 3 times, Sample2 2 times, and Sample3 4 times.
I would like to have a table in the following format:

Sample Group Column 3 Column 4
Sample1_Rep1 1
Sample1_Rep2 1
Sample1_Rep3 1
Sample2_Rep1 2
Sample2_Rep2 2
Sample3_Rep1 3
Sample3_Rep2 3
Sample3_Rep3 3
Sample3_Rep4 3

I have tried it with pandas, but so far I was not successful; I always end up with too many groups. For instance, Sample3_Rep1 and Sample3_Rep2 is a single group, Sample3_Rep1 and Sample3_Rep3 is another group.

Would you be able to post the code snippet where you’re doing the grouping? My guess is that something wonky is going on with your key selection. I’m remembering that the Python / pandas syntax here can be a little tricky so hopefully it’s an easy fix!

I gave up with pandas, and instead used igraph for this task.

# Identify duplicates samples
rel = hl.pc_relate(mt.GT, 
        min_individual_maf = 0.01, 
        k = 20, 
        statistics = "kin", 
        min_kinship = (1/(2**1.5)))

# Convert to pandas df with three 3 columns:
# i.s, j.s and kinship
rel_df = rel.to_pandas()

# Create a graph
g = ig.Graph.DataFrame(rel_df, use_vids = False)

# Create network, because we only know A->B and B->C, but not A->C
components = g.connected_components(mode = "weak")

# Sample names
sample_names = g.vs["name"]

# Group membership
group = components.membership

# Create pandas dataframe
rel_df = pd.DataFrame(data = {"Sample": sample_names, "group": [x + 1 for x in group]})

1 Like