Variant Annotation Table Merge?

Cecile_Avery · March 10, 2025, 10:22pm

Hello,

I am supporting a project using All of Us genomic data, which has opted to store their data in hail matrix tables (and other formats).

To reduce the file size, they have separated variant annotations in a separate table (Variant Annotation Table) that can be filtered with hail functions. The result is a table with chromosome, position, allele, gene, etc… but contains no information on individual samples.

How can I use this table to pull out samples from the matrix table that actually contains genotype information? Should I create intervals from the VAT and use this to filter the genotype table, and then retrieve the sample IDs per variant? Or perhaps some kind of inner join?

This seems like a very expensive and slow operation, and I’m not sure if there is a better way outside of exporting the variant annotations and filtering a VCF or PLINK file instead? I’d like to learn how to maximize the utility of hail matrix tables and when it is best to exit this data format.

Advice would be appreciated!

patrick-schultz · March 21, 2025, 8:14pm

Hi @Cecile_Avery,

Sorry for the slow response! I can try to help, but I’ll need more details about what you’d like to do. Do you want to filter the VAT to some variants, then extract all samples containing that variant?

The full AoU dataset is very large, and you can expect any operation that needs to read the genotype data to be relatively slow and expensive. But at this scale Hail really is the best option. If your goal is to extract a small subset of the data, then converting to another format and using more familiar tools may well be a good option.

njain · April 15, 2025, 11:52pm

hi @ Cecile_Avery and @ patrick-schultz, i think i’m running into a similar problem like this one. I am trying to merge demographic information to filtered variants from the VAT. were you able to solve this issue?

Topic		Replies	Views
Export VCF taking a long time, even when running in parallel Hail Query & hailctl	3	486	December 5, 2023
[Hail 0.2] Merge two MatrixTable Help [0.1]	11	3012	November 19, 2019
Filter variants based on other files Hail Query & hailctl	3	442	February 9, 2022
Annotate a MatrixTable with rows from a different MatrixTable Hail Query & hailctl	0	381	November 13, 2020
Merging single sample MatrixTables into one big MatrixTable Hail Query & hailctl	3	755	November 18, 2022

Variant Annotation Table Merge?

Related topics