Hi,
I’m solving a simple and probably common task, but I don’t know how to deal with it:
I would like to create one big matrixtable for all my vcfs, but when I use union_cols with outer_join the non-key fields from new vcf are set to NA. How should I solve this issue ?
My code is:
vcfs = glob.glob(os.path.join(VCF_FOLDER,'*.vcf'))
mt = hl.import_vcf(vcfs[0], reference_genome=REF_GENOM)
for path in vcfs[1:]:
        new_mt = hl.import_vcf(path, reference_genome=REF_GENOM)  
        mt = mt.union_cols(new_mt,row_join_type='outer')
Results:
mt:
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| locus         | alleles     |     qual | filters               | info.AC      | info.AF        |
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| chr1:10230    | ["AC","A"]  | 4.84e+01 | {}                    | [2]          | [1.00e+00]     |
new_mt:
+---------------+------------+------+----------+----------+--------------+----------------+---------+
| locus         | alleles    |   qual | filters  | info.AC      | info.AF        | info.AN |
+---------------+------------+------+----------+----------+--------------+----------------+---------+
| chr1:10230    | ["AC","A"]  | 9.23e+01 | {}       | [2]          | [1.00e+00]     |       2 |
| chr1:10247    | ["TA","T"] | 3.47e+01 | {}       | [2]          | [1.00e+00]     |       2 |
Union
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| locus          | alleles    |  qual | filters               | info.AC      | info.AF        |
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| chr1:10230    | ["AC","A"] | 4.84e+01 | {}                    | [2]          | [1.00e+00]     |
| chr1:10247    | ["TA","T"] |  NA                    | NA           | NA             |
Thank you Radim