Hi,
I’m solving a simple and probably common task, but I don’t know how to deal with it:
I would like to create one big matrixtable for all my vcfs, but when I use union_cols with outer_join the non-key fields from new vcf are set to NA. How should I solve this issue ?
My code is:
vcfs = glob.glob(os.path.join(VCF_FOLDER,'*.vcf'))
mt = hl.import_vcf(vcfs[0], reference_genome=REF_GENOM)
for path in vcfs[1:]:
new_mt = hl.import_vcf(path, reference_genome=REF_GENOM)
mt = mt.union_cols(new_mt,row_join_type='outer')
Results:
mt:
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| locus | alleles | qual | filters | info.AC | info.AF |
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| chr1:10230 | ["AC","A"] | 4.84e+01 | {} | [2] | [1.00e+00] |
new_mt:
+---------------+------------+------+----------+----------+--------------+----------------+---------+
| locus | alleles | qual | filters | info.AC | info.AF | info.AN |
+---------------+------------+------+----------+----------+--------------+----------------+---------+
| chr1:10230 | ["AC","A"] | 9.23e+01 | {} | [2] | [1.00e+00] | 2 |
| chr1:10247 | ["TA","T"] | 3.47e+01 | {} | [2] | [1.00e+00] | 2 |
Union
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| locus | alleles | qual | filters | info.AC | info.AF |
+---------------+------------+------+----------+-----------------------+--------------+----------------+
| chr1:10230 | ["AC","A"] | 4.84e+01 | {} | [2] | [1.00e+00] |
| chr1:10247 | ["TA","T"] | NA | NA | NA |
Thank you Radim