Hi all, could you possibly help a newbie get my VCFs into the right format to work with hail?
I’m hoping to use seqr to analyse my own WGS due to severe health problems, but am totally new to bioinformatics (and linux) and can’t work out how to adapt my VCFs.
My sequence is from 10x Chromium and the file I’d most like to use is a VCFv4.1 from LongRanger, since that’s best placed to take advantage of the linked reads. If I try to load it to seqr I get:
HailException: invalid PL field `0,2510,18100’: expected 6 values, but got 3.
which is confusing me because the VCFv4.2 spec says PL field is supposed to have 3 values. Could something else in my file be telling it to expect 6?
I also have a VCFv4.2 from GATK pipelines, but running that in seqr throws up lots of this warning:
Hail: WARN: Struct{allele_num:Int,amino_acids:String,biotype:String,canonical:Int,ccds:String,cdna_start:Int,cdna_end:Int,cds_end:Int,cds_start:Int,codons:String,consequence_terms:Array[String],distance:Int,domains:Array[Struct{db:String,name:String}],exon:String,flags:String,gene_id:String,gene_pheno:Int,gene_symbol:String,gene_symbol_source:String,hgnc_id:String,hgvsc:String,hgvsp:String,hgvs_offset:Int,impact:String,intron:String,lof:String,lof_flags:String,lof_filter:String,lof_info:String,minimised:Int,polyphen_prediction:String,polyphen_score:Double,protein_end:Int,protein_start:Int,protein_id:String,sift_prediction:String,sift_score:Double,strand:Int,swissprot:String,transcript_id:String,trembl:String,uniparc:String,variant_allele:String} has no field tsl at .transcript_consequences.
then eventually just stops, no index loaded into elasticsearch.
Any help would be much appreciated.