Merging new genotype fields

Dear Hail community,

We have a matrix table we have created from 92 cases and 100 controls. These were not joint variant called together. Cases were joint called and controls were also joint called seperately. We have created combined matrix table from these samples from VCFs. We don’t have the full bam files for the any of the samples. After filtering for related genes, we noticed this was a problem, and we asked for only the related regions from the bam files and did the variant call from the controls as gVCF. Our goal is to perform SKAT analysis, so we need the genotypes for the SNPs. We wanted to add this genotype calls to the our original matrix table.

Since we get the genotypes from the new variant calls from the gVCF files where controls are mostly REF HOM, we just want to merge them on the locus. We have tried matching using the following but it does not fill the genotypes of the control samples for some reason. Both of them have the chr prefix and both of them are GRCh38. How can we annotate GT fields from a new matrix based on locus?


new_genotypes_by_locus = new_genotypes.key_rows_by('locus')
old_matrix_by_locus = old_matrix.key_rows_by('locus')

old_matrix_with_new_gt = old_matrix_by_locus.annotate_entries(
    GT = hl.or_else(
        new_genotypes_by_locus[old_matrix_by_locus.row_key, old_matrix_by_locus.col_key].GT,
        old_matrix_by_locus.GT
    )
)
1 Like