Hello Hail Team,
I was trying to find pathogenic variants carriers described in ClinVar, but when using annotate_rows_db()
I couldn’t get any variant-level annotation. After the annotation these were the type of annotation I got: ‘deletion’, ‘indel’, ‘copy number gain’, ‘copy number loss’, ‘duplication’ and ‘insertion’.
I know that there are pathogenic variants on this cohort and I confirmed it using ClinVar’s vcf file and 'semi-join’ing with this cohort.
Why SNV-level variations are not annotated? Is this the expected behavior or am I doing sometging wrong?
I’ve added the code I’m using below.
Thank you for the support,
Rodrigo Barreiro
–
import hail as hl
hl.init()
mt = hl.read_matrix_table("s3://../my_cohort.mt")
db = hl.experimental.DB(region='us', cloud='aws')
mt = mt.filter_rows(hl.len(mt.alleles) == 2)
mt = mt.key_rows_by('locus','alleles')
mt = db.annotate_rows_db(mt, 'clinvar_variant_summary')
what’s the schema of the returned matrix table here?
mt.describe()
Hey @rodrigo.barreiro ,
Could you share the FTP/HTTP URL to the clinvar VCF file you used? The clinvar_variant_summary
dataset must be based on a different file. Our dataset includes a staggering number of MNVs.
Hey @rodrigo.barreiro ,
I did a little digging on this. The latest clinvar summary file has data like this (I’ve elided some columns)
#AlleleID Type Assembly Chromosome Start Stop ReferenceAllele AlternateAllele PositionVCF ReferenceAlleleVCF AlternateAlleleVCF
"15041" "Indel" "GRCh37" "7" "4820844" "4820847" "na" "na" "4820844" "GGAT" "TGCTGTAAACTGTAACTGTAAA"
"15041" "Indel" "GRCh38" "7" "4781213" "4781216" "na" "na" "4781213" "GGAT" "TGCTGTAAACTGTAACTGTAAA"
"15042" "Deletion" "GRCh37" "7" "4827361" "4827374" "na" "na" "4827360" "GCTGCTGGACCTGCC" "G"
"15042" "Deletion" "GRCh38" "7" "4787730" "4787743" "na" "na" "4787729" "GCTGCTGGACCTGCC" "G"
"15043" "single nucleotide variant" "GRCh37" "15" "85342440" "85342440" "na" "na" "85342440" "G" "A
I believe our dataset used the start and stop to construct an interval.
If your dataset these variants
7:4827360:GCTGCTGGACCTGCC:G
7:4827361:G:T
7:4827361:GCTGCTGGACCTGCC:G
7:4827363:T:A
15:85342440:G:A
15:85342440:G:T
Which annotations do you want to attach to which variants?