Good day!
I am currently making requests to mtxes, these are raid reqs:
mtx.filter_rows(mtx.rsid == "rs28680688").rsid.show()
As far as I get it, that’s a tiny mtx, which contains data with my rsid. However, some of those may be empty. Is there a built-in function to check whether the mtx is empty? Also, can I export those data to tsv or some other file?
If you want to check how many rows after a filter, you can do:
fmtx.filter_rows(mtx.rsid == "rs28680688").count_rows()
If there’s 0, then it’s empty.
What exactly do you want to export to a TSV? Just the row data, the rows and the entries, something else?
I guess that’s the row data. I wish I could sen to tsv the results of the query:
hl.export_vcf(mtx.filter_rows(mtx.rsid == searchIDorLOC[0]),
vcfPathRSID)
That’s a tiny mtx, I currently reexport the one as a vcf. And then analyse with a different script. But I wish I could export the results (including rsid) to a tsv:
+---------------+------------+------------+------------+------------+
| locus | alleles | 'HG003'.GT | 'HG004'.GT | 'HG002'.GT |
+---------------+------------+------------+------------+------------+
| locus<GRCh37> | array<str> | call | call | call |
+---------------+------------+------------+------------+------------+
| 1:736523 | ["T","C"] | 1/1 | 1/1 | 1/1 |
Like this table (though extended a bit).
danking
February 27, 2022, 3:36pm
4
Hey @annalisasnow !
This is not well documented but you can export matrix table information to a TSV. This is documented under the documentation for Expression.export
.
In [4]: import hail as hl
...: mt = hl.balding_nichols_model(3, 10, 100)
...: mt.GT.export('/tmp/foo.tsv')
2022-02-27 10:34:39 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 100 variants...
2022-02-27 10:34:40 Hail: INFO: Coerced sorted dataset
2022-02-27 10:34:40 Hail: INFO: merging 9 files totalling 5.4K...
2022-02-27 10:34:40 Hail: INFO: while writing:
/tmp/foo.tsv
merge time: 35.445ms
In [5]: !head /tmp/foo.tsv
locus alleles 0 1 2 3 4 5 6 7 8 9
1:1 ["A","C"] 0/0 0/0 0/1 1/1 0/1 0/0 0/1 0/1 0/1 0/1
1:2 ["A","C"] 1/1 1/1 1/1 0/1 0/1 1/1 0/1 0/1 1/1 0/1
1:3 ["A","C"] 1/1 0/0 0/1 1/1 1/1 0/0 1/1 0/0 1/1 1/1
1:4 ["A","C"] 0/1 0/0 0/1 1/1 0/0 0/1 0/1 0/1 0/1 0/0
1:5 ["A","C"] 1/1 0/0 0/0 0/0 0/0 0/0 0/1 1/1 0/0 0/1
1:6 ["A","C"] 1/1 0/1 1/1 0/1 0/0 0/1 0/1 0/0 0/0 0/0
1:7 ["A","C"] 0/1 0/0 0/0 0/1 0/1 1/1 0/1 0/1 0/0 0/0
1:8 ["A","C"] 0/0 1/1 0/1 1/1 0/1 1/1 0/1 0/1 1/1 1/1
1:9 ["A","C"] 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1
If you would like to include the RSID in the output, add the RSID to the row key:
In [7]: import hail as hl
...: mt = hl.balding_nichols_model(3, 10, 100)
...: mt = mt.annotate_rows(rsid = 'abcdef123')
...: mt = mt.key_rows_by(*mt.row_key, 'rsid')
...: mt.GT.export('/tmp/foo.tsv')
2022-02-27 10:36:21 Hail: INFO: balding_nichols_model: generating genotypes for 3 populations, 10 samples, and 100 variants...
2022-02-27 10:36:22 Hail: INFO: Coerced sorted dataset
2022-02-27 10:36:22 Hail: INFO: Coerced sorted dataset
2022-02-27 10:36:22 Hail: INFO: merging 9 files totalling 6.4K...
2022-02-27 10:36:22 Hail: INFO: while writing:
/tmp/foo.tsv
merge time: 30.581ms
In [8]: !head /tmp/foo.tsv
locus alleles rsid 0 1 2 3 4 5 6 7 8 9
1:1 ["A","C"] abcdef123 0/1 0/1 0/1 1/1 1/1 0/0 0/1 0/1 0/1 1/1
1:2 ["A","C"] abcdef123 0/1 0/1 0/1 0/0 0/1 0/1 1/1 1/1 0/1 1/1
1:3 ["A","C"] abcdef123 0/0 0/0 0/0 0/1 0/1 0/1 0/0 0/1 0/1 0/0
1:4 ["A","C"] abcdef123 0/1 1/1 0/1 0/0 0/1 0/1 0/1 0/0 0/0 0/0
1:5 ["A","C"] abcdef123 1/1 0/0 1/1 0/0 0/1 1/1 0/1 0/1 0/1 0/1
1:6 ["A","C"] abcdef123 0/1 1/1 1/1 0/1 1/1 1/1 0/1 0/1 0/0 1/1
1:7 ["A","C"] abcdef123 0/0 0/1 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0
1:8 ["A","C"] abcdef123 1/1 1/1 1/1 1/1 1/1 1/1 1/1 0/1 0/1 1/1
1:9 ["A","C"] abcdef123 1/1 0/1 1/1 0/1 1/1 1/1 1/1 1/1 0/1 1/1