Filtering is really fast but show is really slow, why is that?

amirfeizi · June 11, 2020, 9:56pm

yes! it worked thanks. Last question before closing this. Any reason that when I want to use show() or any other method on ht_filt the first time it prints out the output (it takes quite a time) and the second time it gets stuck and I need to restart the kernal and rerun the code.

danking · June 19, 2020, 3:11pm

Hail only performs computation the first time you observe the output, such as with collect, show, or write. When you evaluate

mt_f1 = mt.filter(...)

Hail does not process the data. Instead, Hail is building a recipe of things that need to be done in order to give you the results you want. Conceptually, mt_f1 is:

Read the table from gs://gnomad-public…, call this mt
Filter mt by using the values from genes_as_hail_literal

It is only at the time that you execute:

mt_f1.show()

that Hail actually executes the recipe you’ve built. If you execute:

mt_f1.show()
mt_f1.show()

Then Hail will run the entire recipe twice. Hail is designed this way because it operates on data that is too large to fit in memory. Hail is like a firehose, either the water is shooting out into something or it is off. There’s no way to pause the firehose half-way through a computation.

amirfeizi · June 21, 2020, 9:25pm

Thank you danking! So basically always subset the table in the beginning by head() and continue scripting and making sure if my script does what I want and then run it later on the data-table itself (basic scripting rule which also applies here, of course!).

PS. Now I have been watching your tutorials on hail and I understand it better! But your explanation here is actually very useful and I could not get it from the tutorials. It would be really nice if you guys can record a short video discussing basic hail scripting(how to convert the question to chain of hail syntax ) as it slightly different what we are used to do in R or in python.

Topic		Replies	Views
Simple question about mt.filter_rows() Hail Query & hailctl	0	408	October 7, 2021
Performance after MatrixTable filtering (repartition question) Hail Query & hailctl	7	1708	December 20, 2018
Counting rows in hail table Hail Query & hailctl	8	570	January 14, 2023
Parallel Hail Tasks Hail Query & hailctl	26	1662	February 28, 2020
Writing my table as csv or vcf or ht takes too long Hail Query & hailctl	5	64	May 4, 2025

Filtering is really fast but show is really slow, why is that?

Related topics