Aggregate functions - Small sample examples

Hello Everyone,

I am new to hail and bioinformatics. Trying to learn the basics of VCF file analysis. So, am following the tutorial provided by hail here

Though all this works, I don’t understand the inner workings… For ex, when we write a group by statement in SQL, we can try to see how the logic works based on tutorials that explain the concept with a small sample tutorial (which we can also validate manually).

Similarly, does Hail have tutorial which can help us understand how does Aggregate functions actually work with the help of small sample data. For ex: For someone like me who is new to this domain, functions like n_alt_alleles() , is_defined(mt.GT)) are kind of black box.

Is it possible to use only the first 10 records from a huge VCF file in a matrix table? May I know how can we get only the first 10 records from large VCF file… I tried doing it but getting an error…

I was about to point you to the MatrixTable documentation for your “first 10 rows” issue, but it seems like an update has partially broken it. It’ll be fixed soon, but in the mean time, to get just the first 10 rows of a VCF read in as a MatrixTable, you can do like:

mt = hl.import_vcf(......)
mt = mt.head(10)

If you want to understand how things work in a bit more depth, we have a series of tutorials here: https://hail.is/docs/0.2/tutorials-landing.html. The GWAS one is just an overview of doing a GWAS in hail, but the subsequent tutorials about Tables, Aggregations, etc. go into how things work a little bit more. You might also check out the Overview here: https://hail.is/docs/0.2/overview/index.html

1 Like