I’m trying to build a reference table using a different reference genome. Variants were called with bcftools mpileup, bcftools call, and bcftools filter. I didn’t receive any errors on reading the ref from a fasta file (this is for the plant species Sorghum bicolor) nor did I receive any errors on importing a vcf file. When i try the count method, however, I get the error below (the full line from hail.log is at the bottom)
Cheers,
Jan Erik
IN: first30.count()
OUT: FatalError: HailException: first_30_merged.vcf:column 180: invalid character ‘.’ in integer literal
… 19;AN=54;AC=53,1 GT:PL 1/1:130,36,0,.,.,. 1/1:100,18,0,.,.,. 1/1:143,27, …
^
offending line: Chr01 74 . G C,A 127 LOWQUAL VDB=0.00595195;SGB=-0.680642;MQ…
see the Hail log for the full offending line
Steps leading up to error:
IN: sb2 = hl.ReferenceGenome.from_fasta_file(name=‘sb2’, fasta_file=’/d1/sorghum/ref/v2/Sbicolor_255_v2.0.fa’, index_file=’/d1/sorghum/ref/v2/Sbicolor_255_v2.0.fa.fai’)
IN: sb2
OUT: ReferenceGenome(name=sb2, contigs=[‘Chr01’, ‘Chr02’, ‘Chr03’, ‘Chr04’, ‘Chr05’, ‘Chr06’, ‘Chr07’, ‘Chr08’, ‘Chr09’, ‘Chr10’, ‘super_10’, ‘super_11’, ‘super_12’, ‘super_13’, ‘super_14’, ‘super_15’, ‘super_16’, ‘super_17’, ‘super_18’, ‘super_19’, ‘super_20’, ‘super_21’, ‘super_22’, ‘super_23’, ‘super_24’, ‘super_25’, ‘super_26’, ‘super_27’, ‘super_28’, ‘super_29’, ‘super_30’, … (many extra contigs)
IN: first30 = hl.import_vcf(’/d1/sorghum/vcfs/first_30_merged.vcf’,reference_genome=‘sb2’)
OUT: 2018-04-26 10:48:11 Hail: INFO: Ordering unsorted dataset with network shuffle
IN: first30.describe()
OUT: ----------------------------------------
Global fields:
None
Column fields:
‘s’: str
Row fields:
‘locus’: locus
‘alleles’: array
‘rsid’: str
‘qual’: float64
‘filters’: set
‘info’: struct {
INDEL: bool,
IDV: int32,
IMF: float64,
DP: int32,
…
From log file:
at java.lang.Thread.run(Thread.java:748)is.hail.utils.HailException: fir
st_30_merged.vcf:column 180: invalid character ‘.’ in integer literal
… 19;AN=54;AC=53,1 GT:PL 1/1:130,36,0,.,.,. 1/1:100,18,0,.,.,. 1/1:143,27, …
^
offending line: Chr01 74 . G C,A 127 LOWQUAL VDB=0.00
595195;SGB=-0.680642;MQSB=1;MQ0F=0.0769231;MQ=17;RPB=0.961538;MQB=0.730769;BQB=0
.730769;DP=309;DP4=9,1,214,19;AN=54;AC=53,1 GT:PL 1/1:130,36,0,.,.,.
1/1:100,18,0,.,.,. 1/1:143,27,0,.,.,. 1/1:140,27,0,.,.,. 1/1:131,
15,0,.,.,. 1/1:125,18,0,.,.,. 1/1:109,12,0,.,.,. 1/1:72,18,0,.,.,
. 1/1:91,8,0,.,.,. 1/1:71,29,20,.,.,. 1/2:158,58,40,114,0,108
1/1:119,9,0,.,.,. 1/1:101,32,17,.,.,. 1/1:101,20,2,.,.,. 1/1:156,
21,2,.,.,. 1/1:78,16,3,.,.,. 1/1:124,15,0,.,.,. 1/1:64,18,0,.,.,
. 1/1:131,24,0,.,.,. 1/1:144,35,24,.,.,. 1/1:109,33,3,.,.,.
1/1:129,32,5,.,.,. 1/1:40,12,0,.,.,. 1/1:138,21,0,.,.,. 1/1:101,15,0,.,.,. ./.:. 1/1:149,45,0,.,.,. 1/1:127,21,0,.,.,.