Filter mt rows by missingness

bluesky · January 26, 2023, 8:13am

Hi,

I’m just trying to filter a mt rows with missing GP, but got an error message “FileNotFoundException: too many open files”.

If not increasing open file limit, would there be any other ways to extract rows or variants with missing GP?

Any suggestions? Thank you very much!

Below is the script:

hl.init(spark_conf={'spark.driver.memory': '20g','spark.executor.memory': '40g'}, tmp_dir = path, default_reference = 'GRCh38')

mt = hl.read_matrix_table('/path/mt', _n_partitions =6000)
mt_impt = mt.filter_rows(hl.agg.any(hl.is_missing(mt.GP)))  # error message shows right after this

danking · January 31, 2023, 8:53pm

The number of open files should be roughly controlled by the number of cores in use. You can control that with

hl.init(master='local[N]')  # N is number of cores to use

I’m rather surprised that you’re hitting this file limit though. Usually the file limit is a lot higher than the number of cores in use.

Can you share more information about your environment? Where are you executing this code?

bluesky · February 1, 2023, 12:26am

Thank you for your reply!

I’m using a local HPC cluster which has 8 cores and 241 gb memory total. The number of open file is the default 1024.

Before I got your suggestion, I tried to reduce the number of partition (from 6000 to 100), then the issue of too many open files went away!

If using your suggestion, should I still use the number of cores as 8?

Thank you!

danking · February 6, 2023, 10:53pm

You should set the number of cores to however many cores your HPC job is permitted to use. It sounds to me that you should set it to 8.

Topic		Replies	Views
Too many open files error Hail Query & hailctl	19	1087	December 16, 2022
Filter mt rows on ht error Hail Query & hailctl	10	660	April 7, 2020
Matrixtable filtering and LD pruning Error message - No space left on device Hail Query & hailctl	4	320	December 14, 2022
Error of hl.king() run for kinship analysis Hail Query & hailctl	8	464	March 3, 2021
Empty matrix table with vcf_combiner.run_combiner Hail Query & hailctl	0	415	June 4, 2021

Filter mt rows by missingness

Related topics