Write matrix table to a csv file

Hi,
Is it possible for me to write a matrix table to a csv file
Thank you

Hi @Jaden30 !

It sure is possible! The docs are hidden at Expression.export:

>>> small_mt.GT.export('output/gt.tsv')
>>> with open('output/gt.tsv', 'r') as f:
...     for line in f:
...         print(line, end='')
locus   alleles 0       1       2       3
1:1     ["A","C"]       0/1     0/1     0/0     0/0
1:2     ["A","C"]       1/1     0/1     1/1     1/1
1:3     ["A","C"]       1/1     0/1     0/1     0/0
1:4     ["A","C"]       1/1     0/1     1/1     1/1

You’ll want to use delimiter=',' to get a CSV instead of a TSV.

Hi Danking,
the examples in the code only show writing a field to a tsv/csv file, there are no ways for me to write the entire matrix table to a csv file?

If you want to write the entire entry as a JSON object, that works:


In [1]: import hail as hl 
   ...:  
   ...: mt = hl.balding_nichols_model(1, 3, 3) 
   ...: mt = mt.annotate_entries(AD=5) 
   ...: mt.entry.export('/tmp/bar.tsv')                                                                                         
In [2]: !cat /tmp/bar.tsv                                                                                                       
locus	alleles	0	1	2
1:1	["A","C"]	{"GT":"0/1","AD":5}	{"GT":"1/1","AD":5}	{"GT":"0/1","AD":5}
1:2	["A","C"]	{"GT":"1/1","AD":5}	{"GT":"0/1","AD":5}	{"GT":"1/1","AD":5}
1:3	["A","C"]	{"GT":"0/1","AD":5}	{"GT":"0/0","AD":5}	{"GT":"0/0","AD":5}

You can include more row fields by adding them to the key:

In [4]: import hail as hl 
   ...:  
   ...: mt = hl.balding_nichols_model(1, 3, 3) 
   ...: mt = mt.annotate_entries(AD=5) 
   ...: non_key_row_fields = set(mt.row) - set(mt.row_key) 
   ...: mt.key_rows_by(*mt.row_key, *non_key_row_fields).entry.export('/tmp/bar.tsv')                                           
In [5]: !cat /tmp/bar.tsv                                                                                                       
locus	alleles	af	ancestral_af	0	1	2
1:1	["A","C"]	[0.44805795611590166]	5.39051e-01	{"GT":"0/1","AD":5}	{"GT":"0/1","AD":5}	{"GT":"0/1","AD":5}
1:2	["A","C"]	[0.7042578478282053]	8.67678e-01	{"GT":"1/1","AD":5}	{"GT":"1/1","AD":5}	{"GT":"1/1","AD":5}
1:3	["A","C"]	[0.35246827252547935]	4.37646e-01	{"GT":"0/1","AD":5}	{"GT":"0/0","AD":5}	{"GT":"0/1","AD":5}

If you don’t want JSON for entries, you can do this, admittedly very ugly, thing:

In [23]: import hail as hl 
    ...:  
    ...: mt = hl.balding_nichols_model(1, 3, 3) 
    ...: mt = mt.annotate_entries(AD=5) 
    ...: mt = mt.annotate_cols(entry_id=list(range(len(mt.entry)))) 
    ...: mt = mt.explode_cols(mt.entry_id) 
    ...: mt = mt.key_cols_by(sample_id = hl.str(mt.sample_idx) + hl.literal('_') + hl.literal(list(mt.entry))[mt.entry_id]) 
    ...: mt = mt.select_entries(entries_as_str = [hl.str(mt[f]) for f in mt.entry]) 
    ...: mt = mt.select_entries(the_entry=mt.entries_as_str[mt.entry_id]) 
    ...: non_key_row_fields = set(mt.row) - set(mt.row_key) 
    ...: mt.key_rows_by(*mt.row_key, *non_key_row_fields).the_entry.export('/tmp/bar.tsv')                                                                                                                
In [22]: !cat /tmp/bar.tsv                                                                                                                                                                                
locus	alleles	af	ancestral_af	0_GT	0_AD	1_GT	1_AD	2_GT	2_AD
1:1	["A","C"]	[0.5383546190066579]	5.39051e-01	0/1	5	0/0	5	0/1	5
1:2	["A","C"]	[0.9595560241510789]	8.67678e-01	1/1	5	0/1	5	1/1	5
1:3	["A","C"]	[0.5301809406988318]	4.37646e-01	0/1	5	0/0	5	0/1	5

As for the column fields, you cannot include those in the CSV. I’m not really sure how to do that in CSV? I would store the column fields in a separate CSV file.

Also, be ware that commas often appear in the JSON that Hail generates for non-scalar fields like the alleles list.

Thank you very much for your response

Traceback (most recent call last):
  File "transvcf.py", line 66, in <module>
    fire.Fire(VCF)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "transvcf.py", line 50, in identify_AF
    dataset = hl.import_vcf(vcf, skip_invalid_loci = True, array_elements_required=False)
  File "<decorator-gen-1348>", line 2, in import_vcf
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 576, in wrapper
    args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 543, in check_all
    args_.append(arg_check(args[i], name, arg_name, checker))
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 584, in arg_check
    return checker.check(arg, function_name, arg_name)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 82, in check
    return tc.check(x, caller, param)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 328, in check
    return f(tc.check(x, caller, param))
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/genetics/reference_genome.py", line 10, in <lambda>
    reference_genome_type = oneof(transformed((str, lambda x: hl.get_reference(x))), rg_type)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/context.py", line 554, in get_reference
    Env.hc()
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/utils/java.py", line 55, in hc
    init()
  File "<decorator-gen-1714>", line 2, in init
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/context.py", line 252, in init
    skip_logging_configuration, optimizer_iterations)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/hail/backend/spark_backend.py", line 176, in __init__
    self._jbackend, log, True, append, branching_factor, skip_logging_configuration, optimizer_iterations)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/java_gateway.py", line 1305, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/jaden/miniconda3/envs/hail/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: is.hail.utils.HailException: Hail requires Java

I had that error when running hail, I do not understand it or know how to fix it, can anybody help. Thank you

Did you install Java? It’s listed in our installation instructions: Hail | Installing Hail

Yes i have using the instructions. Still returning the error

This looks like it’s missing part of the message. It should say something like:

Hail requires Java 8 or 11, found 12

What version of java (java -version) do you have installed?