I have an issue running hl.vds.interval_coverage.

First of all, I am not sure how big the intevals in Table of intervals should be. Do you have some easy examle how to properly define intervals for interval_coverage analysis?

I tried hl.vds.interval_coverage on this interval: chr1:12345-chr1:17299.
But an error ocurred:
AttributeError Traceback (most recent call last) Cell In [23], line 1 ----> 1 hl.vds.interval_coverage(vds, gq_thresholds=(0, 10, 20), dp_thresholds=(0, 1, 10, 20, 30), intervals=table_intervals) File :2, in interval_coverage(vds, intervals, gq_thresholds, dp_thresholds, dp_field) File /usr/local/lib/python3.8/dist-packages/hail/typecheck/check.py:577, in _make_dec..wrapper(__original_func, *args, **kwargs) 574 @decorator 575 def wrapper(original_func, *args, **kwargs): 576 args, kwargs = check_all(__original_func, args, kwargs, checkers, is_method=is_method) → 577 return original_func(*args, **kwargs) File /usr/local/lib/python3.8/dist-packages/hail/vds/methods.py:835, in interval_coverage(vds, intervals, gq_thresholds, dp_thresholds, dp_field) 830 else: 831 dp_field_dict = dict() 833 per_interval = split.group_rows_by(interval=intervals[split.row_key[0]].interval_dup) \ 834 .aggregate( → 835 bases_over_gq_threshold=tuple( 836 hl.agg.filter(split.GQ >= gq_threshold, hl.agg.sum(ref_block_length)) for gq_threshold in 837 gq_thresholds), 838 **dp_field_dict 839 ) 841 interval = per_interval.interval 842 interval_size = interval.end.position + interval.includes_end - interval.start.position - 1 + interval.includes_start

120 return self.dict[item] → 122 raise AttributeError(get_nice_attr_error(self, item)) AttributeError: MatrixTable instance has no field, method, or property ‘GQ’ Hint: use ‘describe()’ to show the names of all data fields.

I tried to describe mine VariantDataset and the GQ is there.

This is when I tried to use show:

Have a good day,

interval_coverage looks at the reference blocks, not variant calls, for information about quality per interval. What’s the schema of the reference data matrixtable?


thank you for the clarification.

You are right. There is no GQ on the reference_data.:

Global fields:

Column fields:
‘s’: str

Row fields:
‘locus’: locus
‘ref_allele’: str

Entry fields:
‘END’: int32

Column key: [‘s’]
Row key: [‘locus’]

I was using combiner set like this:

combiner = hl.vds.new_combiner(


Do you think it would help try to use GRCh37 (the default reference for the combiner) like is shown in the example on hail documentation - Hail | VariantDatasetCombiner? Do you think GRCh37 will have GQ unlike the GRCh38

Hello anyone,

is there any idea how to run vds combiner in a way to get Variant Dataset with reference_data part including GQ?

Have a nice day,
Veronika Bůžková

Can you paste the #FORMAT lines of the GVCF header and one data row for a reference block (with an END field in INFO)?

##ALT=<ID=NON_REF,Description=“Represents any possible alternative allele at this location”>
##FORMAT=<ID=AD,Number=R,Type=Integer,Description=“Allelic depths (counting only informative reads out of the total reads) for the ref and alt alleles in the order listed”>
##FORMAT=<ID=AF,Number=A,Type=Float,Description=“Allele fractions for alt alleles in the order listed”>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description=“Approximate read depth (reads with MQ=255 or with bad mates are filtered)”>
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description=“Count of reads in F1R2 pair orientation supporting each allele”>
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description=“Count of reads in F2R1 pair orientation supporting each allele”>
##FORMAT=<ID=GOP,Number=2,Type=Float,Description=“Phred-scaled gap open penalty for insertions and deletions”>
##FORMAT=<ID=GP,Number=G,Type=Float,Description=“Phred-scaled posterior probabilities for genotypes as defined in the VCF specification”>
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=“Genotype Quality”>
##FORMAT=<ID=ICNT,Number=.,Type=Integer,Description=“Counts of INDEL informative reads based on the reference confidence model”>
##FORMAT=<ID=MB,Number=4,Type=Integer,Description=“Per-sample component statistics to detect mate bias”>
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description=“Minimum DP observed within the GVCF block”>
##FORMAT=<ID=PL,Number=G,Type=Integer,Description=“Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification”>
##FORMAT=<ID=PRI,Number=G,Type=Float,Description=“Phred-scaled prior probabilities for genotypes”>
##FORMAT=<ID=PS,Number=1,Type=Integer,Description=“Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group”>
##FORMAT=<ID=SB,Number=4,Type=Integer,Description=“Per-sample component statistics which comprise the Fisher’s Exact Test to detect strand bias”>
##FORMAT=<ID=SPL,Number=.,Type=Integer,Description=“Normalized, Phred-scaled likelihoods for SNPs based on the reference confidence model”>
##FORMAT=<ID=SQ,Number=A,Type=Float,Description=“Somatic quality”

And reference INFO seems empty, am I right?