How to check matrix length

annalisasnow · February 25, 2022, 10:47am

Good day!

I am currently making requests to mtxes, these are raid reqs:

        mtx.filter_rows(mtx.rsid == "rs28680688").rsid.show()

As far as I get it, that’s a tiny mtx, which contains data with my rsid. However, some of those may be empty. Is there a built-in function to check whether the mtx is empty? Also, can I export those data to tsv or some other file?

johnc1231 · February 25, 2022, 10:24pm

If you want to check how many rows after a filter, you can do:

fmtx.filter_rows(mtx.rsid == "rs28680688").count_rows()

If there’s 0, then it’s empty.

What exactly do you want to export to a TSV? Just the row data, the rows and the entries, something else?

annalisasnow · February 26, 2022, 11:50am

I guess that’s the row data. I wish I could sen to tsv the results of the query:

hl.export_vcf(mtx.filter_rows(mtx.rsid == searchIDorLOC[0]),
                                              vcfPathRSID)

That’s a tiny mtx, I currently reexport the one as a vcf. And then analyse with a different script. But I wish I could export the results (including rsid) to a tsv:

+---------------+------------+------------+------------+------------+
| locus         | alleles    | 'HG003'.GT | 'HG004'.GT | 'HG002'.GT |
+---------------+------------+------------+------------+------------+
| locus<GRCh37> | array<str> | call       | call       | call       |
+---------------+------------+------------+------------+------------+
| 1:736523      | ["T","C"]  | 1/1        | 1/1        | 1/1        |

Like this table (though extended a bit).

danking · February 27, 2022, 3:36pm

Hey @annalisasnow !

This is not well documented but you can export matrix table information to a TSV. This is documented under the documentation for Expression.export.

In [4]: import hail as hl 
   ...: mt = hl.balding_nichols_model(3, 10, 100) 
   ...: mt.GT.export('/tmp/foo.tsv')                                                                                                                                                                                                                                            
2022-02-27 10:34:39 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 100 variants...
2022-02-27 10:34:40 Hail: INFO: Coerced sorted dataset
2022-02-27 10:34:40 Hail: INFO: merging 9 files totalling 5.4K...
2022-02-27 10:34:40 Hail: INFO: while writing:
    /tmp/foo.tsv
  merge time: 35.445ms

In [5]: !head /tmp/foo.tsv                                                                                                                                                                                                                                                      
locus	alleles	0	1	2	3	4	5	6	7	8	9
1:1	["A","C"]	0/0	0/0	0/1	1/1	0/1	0/0	0/1	0/1	0/1	0/1
1:2	["A","C"]	1/1	1/1	1/1	0/1	0/1	1/1	0/1	0/1	1/1	0/1
1:3	["A","C"]	1/1	0/0	0/1	1/1	1/1	0/0	1/1	0/0	1/1	1/1
1:4	["A","C"]	0/1	0/0	0/1	1/1	0/0	0/1	0/1	0/1	0/1	0/0
1:5	["A","C"]	1/1	0/0	0/0	0/0	0/0	0/0	0/1	1/1	0/0	0/1
1:6	["A","C"]	1/1	0/1	1/1	0/1	0/0	0/1	0/1	0/0	0/0	0/0
1:7	["A","C"]	0/1	0/0	0/0	0/1	0/1	1/1	0/1	0/1	0/0	0/0
1:8	["A","C"]	0/0	1/1	0/1	1/1	0/1	1/1	0/1	0/1	1/1	1/1
1:9	["A","C"]	1/1	1/1	1/1	1/1	1/1	1/1	1/1	1/1	1/1	1/1

If you would like to include the RSID in the output, add the RSID to the row key:

In [7]: import hail as hl 
   ...: mt = hl.balding_nichols_model(3, 10, 100) 
   ...: mt = mt.annotate_rows(rsid = 'abcdef123') 
   ...: mt = mt.key_rows_by(*mt.row_key, 'rsid') 
   ...: mt.GT.export('/tmp/foo.tsv')                                                                                                                                                                                                                                            
2022-02-27 10:36:21 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 100 variants...
2022-02-27 10:36:22 Hail: INFO: Coerced sorted dataset
2022-02-27 10:36:22 Hail: INFO: Coerced sorted dataset
2022-02-27 10:36:22 Hail: INFO: merging 9 files totalling 6.4K...
2022-02-27 10:36:22 Hail: INFO: while writing:
    /tmp/foo.tsv
  merge time: 30.581ms

In [8]: !head /tmp/foo.tsv                                                                                                                                                                                                                                                      
locus	alleles	rsid	0	1	2	3	4	5	6	7	8	9
1:1	["A","C"]	abcdef123	0/1	0/1	0/1	1/1	1/1	0/0	0/1	0/1	0/1	1/1
1:2	["A","C"]	abcdef123	0/1	0/1	0/1	0/0	0/1	0/1	1/1	1/1	0/1	1/1
1:3	["A","C"]	abcdef123	0/0	0/0	0/0	0/1	0/1	0/1	0/0	0/1	0/1	0/0
1:4	["A","C"]	abcdef123	0/1	1/1	0/1	0/0	0/1	0/1	0/1	0/0	0/0	0/0
1:5	["A","C"]	abcdef123	1/1	0/0	1/1	0/0	0/1	1/1	0/1	0/1	0/1	0/1
1:6	["A","C"]	abcdef123	0/1	1/1	1/1	0/1	1/1	1/1	0/1	0/1	0/0	1/1
1:7	["A","C"]	abcdef123	0/0	0/1	0/0	0/1	0/0	0/0	0/0	0/0	0/0	0/0
1:8	["A","C"]	abcdef123	1/1	1/1	1/1	1/1	1/1	1/1	1/1	0/1	0/1	1/1
1:9	["A","C"]	abcdef123	1/1	0/1	1/1	0/1	1/1	1/1	1/1	1/1	0/1	1/1

Topic		Replies	Views
What is the right way to query by RSID? Hail Query & hailctl	4	481	February 15, 2022
Simple question about mt.filter_rows() Hail Query & hailctl	0	408	October 7, 2021
Is it possible to use matrix with a database tool? Hail Query & hailctl	3	459	February 24, 2022
Counting rows in hail table Hail Query & hailctl	8	570	January 14, 2023
Export variants to a tsv file Hail Query & hailctl	6	398	June 18, 2022

How to check matrix length

Related topics