Condition on histogram

Hi,
On one hand, I wish to plot the number of calls along a chromosome, using a specific window. (number of call for each 10,000bp)
On the other hand, I have a bed file with a region of interests
I wish to highlight these regionson my plot.
I can filter my mt to separate the sites inside or outside of my regions of interest using filter_rows.

For now, I got an histogram of number of sites per position:

p = hl.plot.histogram(mt.locus.position, range=(0,51000000), bins=5100)
show(p)

but I am not able o highlight the regions of interest…
that can be by having the bars of a different color, or having a box in the background or 2 histograms on top of each other…

any ideas ?

Solve the issue:

I first filter my data:

# Import bed of region of interest
bed = hl.import_bed('s3://[...]contigs.bed', reference_genome='GRCh38')
# Load mt
mt_g = hl.read_matrix_table('s3://[...].genotypes.mt')
# Filter chromosome
mt_sub = hl.filter_intervals(mt_g, [hl.parse_locus_interval('chr22', reference_genome='GRCh38')])

Next I subset my two group (main regions and region of interest)

# Subset sites on 1st group
mt_sub_1st = mt_sub.filter_rows(hl.is_defined(alt_bed[mt_sub.locus]), keep=False)
# Subset sites from 2nd group
mt_sub_2nd = mt_sub.filter_rows(hl.is_defined(alt_bed[mt_sub.locus]))

Then I compute the histograms

# 1st hist
hist_1st = mt_sub_1st.aggregate_rows(hl.expr.aggregators.hist(mt_sub_1st.locus.position, 0, 51000000, 510))
# 2nd hist
hist_2nd = mt_sub_2nd.aggregate_rows(hl.expr.aggregators.hist(mt_sub_2nd.locus.position, 0, 51000000, 510))

Next I plot the 2 histograms on the same figure

# Plot with 1st hist
p = hl.plot.histogram(hist_1st, legend='1st group', title='site count by position window')
# Add 2nd hist
p.quad(
        bottom=0, top=hist_2nd.bin_freq,
        left=hist_2nd.bin_edges[:-1], right=hist_2nd.bin_edges[1:],
        legend='2nd group', line_color='white', color='red')

show(p)

Success, one histogram with 2 series, of different colors.

Glad you got a solution! I think it’s also possible to do the subset/compute in one step, instead of computing over the data twice:

# starting from the mt_sub = ...

in_interval = hl.is_defined(alt_bed[mt_sub.locus])
hist_1st, hist_2nd = mt_sub.aggregate_rows((
    hl.agg.filter(~in_interval, hl.agg.hist(mt_sub_1st.locus.position, 0, 51000000, 510)),
    hl.agg.filter(in_interval, hl.agg.hist(mt_sub_1st.locus.position, 0, 51000000, 510))
))
1 Like