`Table` to `MatrixTable` to export `VCF`

I have contents of a VCF in a hail table and I realize exporting it as a VCF doesn’t add the FORMAT field of the sample to the VCF. So, I’d like to convert it to a matrixtable to do mt.export_vcf() but unclear on how to do it.

I created a tstruct with all the FORMAT fields (based on Hail | Import / Export and `filter_entries` introduces NA instead of removing entries from a matrixtable - #5 by danking). I’m a little confused on the implementation here.

Here’s what I have till now:

  1. A hail table,
>>> test.describe()
----------------------------------------
Global fields:
    None
----------------------------------------
Row fields:
    'locus': locus<GRCh38> 
    'alleles': array<str> 
    'rsid': str 
    'qual': float64 
    'filters': set<str> 
    'info': struct {
        AC: array<int32>, 
        AF: array<float64>, 
        AN: int32, 
        BaseQRankSum: float64, 
        DB: bool, 
        DP: int32, 
        ExcessHet: float64, 
        FS: float64, 
        InbreedingCoeff: float64, 
        MLEAC: array<int32>, 
        MLEAF: array<float64>, 
        MQ: float64, 
        MQRankSum: float64, 
        QD: float64, 
        ReadPosRankSum: float64, 
        SOR: float64
    } 
    'test.AD': array<int32> 
    'test.DP': int32 
    'test.GQ': int32 
    'test.GT': call 
    'test.PL': array<int32> 
----------------------------------------
Key: ['locus', 'alleles']
----------------------------------------
  1. Created a tstruct to be included in the FORMAT field
>>> test_struct.describe()
--------------------------------------------------------
Type:
        struct {
        AD: array<int32>, 
        DP: int32, 
        GQ: int32, 
        GT: call, 
        PL: array<int32>
    }
--------------------------------------------------------
Source:
    <hail.table.Table object at 0x7f0b9714fa70>
Index:
    ['row']
--------------------------------------------------------
  1. I’m stuck wondering how I would use the table (from 1.) and tstruct (from 2.) to convert them into a matrixtable using to_matrix_table_row_major.

Any ideas on how to convert a hail Table to MatrixTable and then export a VCF?

Thanks,
Faizal

It looks like you have fields called test.AD, which is a bit confusing. usually Hail uses a dot to indicate a nested struct. You need to restructure those into a truly nested struct:

ht = ht.annotate(test = hl.struct(
    AD=ht['test.AD'], 
    DP=ht['test.DP'], 
    GQ=ht['test.GQ'],
    GT=ht['test.GT'],
    PL=ht['test.PL']
)

Then you can use this to convert to a matrix table in which the sub-fields of test become the entry fields of a sample identified by the string "test".

mt = ht.to_matrix_table_row_major(['test'], col_field_name='s')

If you prefer a different sample ID, change the name used in annotate.

Here’s a working example. Note that the printed form does not look any different, but it is indeed a matrix table.

In [6]: ht = hl.utils.range_table(2).annotate(x = hl.struct(a=1, b=2, c=3))
   ...: ht.show()
   ...: mt = ht.to_matrix_table_row_major(['x'], col_field_name='s')
   ...: mt.show()
+-------+-------+-------+-------+
|   idx |   x.a |   x.b |   x.c |
+-------+-------+-------+-------+
| int32 | int32 | int32 | int32 |
+-------+-------+-------+-------+
|     0 |     1 |     2 |     3 |
|     1 |     1 |     2 |     3 |
+-------+-------+-------+-------+
+-------+-------+-------+-------+
|   idx | 'x'.a | 'x'.b | 'x'.c |
+-------+-------+-------+-------+
| int32 | int32 | int32 | int32 |
+-------+-------+-------+-------+
|     0 |     1 |     2 |     3 |
|     1 |     1 |     2 |     3 |
+-------+-------+-------+-------+

You don’t need a Hail type to do this (the hl.struct(AD=...) figures out its own type).

If you want to include filters, add a row field named filters that contains a set of strings. We also support qual and rsid.

1 Like

Great! Thanks a lot explaining this. I was able to implement it.

Thanks,