Looking for a function to export a matrix table to tsv like export_samples() in hail0.1

Hi,
I want to plot sample QC metrics and I have a workable python plot script which takes tsv file as input.
In hail0.1 I used:
vds.sample_qc().export_samples('output.tsv', 'Sample = s, va.qc.*')
to get suitable tsv.

I want to do the same thing with Hail0.2, I found table.export() but not matrix table.

I am wondering do we have a convenient way to do this?

Thanks for any help.

Hello, Shuang! Thank you for your question :slight_smile:

A few follow up questions –

  1. do you have a matrix table instead of a vds?
  2. if you have done step 1), you can save your sample qc into a Hail table and then use table.export(). A clue to this would be using mt.col() in order to get your sample qc into a table.

Hi, really thx for answering.

Yes, actually I started from vcf, write into mt and already QCed my data in mt format.

Could I use
mt = hl.sample_qc(mt)
mt.col.sample_qc.export()

about save my sample_qc into a separately table.
do you mean:
mt = hl.sample_qc(mt)
table = mt.col.sample_qc
table.export()

after I tried:
mt = hl.sample_qc(mt, name='sample_qc')
table1 = mt.col.sample_qc
table1.export('gs://path/table1.tsv.bgz')

It failed in Stage1. Even after I added : --properties spark.speculation=true, when submit job to my gcp cluster

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

I don’t think that’s the error message. Did more than that get printed? That looks like just a warning.

Hi It did report red ERROR flag and it report like:

Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See SLF4J Error Codes for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

I tried:

mt = hl.sample_qc(mt, name=‘sample_qc’)
mt.col.describe()
table1 = mt.col.sample_qc
table1.describe()
table1.shwo()

To make sure not the table generating problems. It looks right.
Problem happened in table1.shwo() it report really similar error message. like table1.export()

Some result:

--------------------------------------------------------
result of table1.describe() is 
--------------------------------------------------------
Type:
        struct {
        dp_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        gq_stats: struct {
            mean: float64, 
            stdev: float64, 
            min: float64, 
            max: float64
        }, 
        call_rate: float64, 
        n_called: int64, 
        n_not_called: int64, 
        n_filtered: int64, 
        n_hom_ref: int64, 
        n_het: int64, 
        n_hom_var: int64, 
        n_non_ref: int64, 
        n_singleton: int64, 
        n_snp: int64, 
        n_insertion: int64, 
        n_deletion: int64, 
        n_transition: int64, 
        n_transversion: int64, 
        n_star: int64, 
        r_ti_tv: float64, 
        r_het_hom_var: float64, 
        r_insertion_deletion: float64
    }
--------------------------------------------------------
Source:
    <hail.matrixtable.MatrixTable object at 0x7f63ebf06780>
Index:
    ['column']

I think it mush be issue about calling sample_qc field.
I tried to use hail0.2 built-in plot function. It report same issue.

mt = hl.sample_qc(mt)
p = hl.plot.histogram(mt.sample_qc.call_rate, range=(.88,1), legend=‘Call Rate’)

@johnc1231 did you have additional thoughts on this?

I would still want to see the whole error message, including the red ERROR flag and the python stack trace.

I’d try updating to version 0.2.52 though, as an issue with sample qc was fixed in that version.