Hi, I’m a bit struggling with formatting my VDS file from local to global. Currently I have to following code to transform my local entries LAD and LGT into their global equivalent:
mt = vds.variant_data.annotate_entries(
AD = hl.vds.local_to_global(vds.variant_data.LAD,
vds.variant_data.LA,
n_alleles=hl.len(vds.variant_data.alleles),
fill_value=0,
number="R")
)
# Transform local genotype to global
mt = mt.annotate_entries(GT = hl.vds.lgt_to_gt(mt.LGT, mt.LA))
# Densify to MT
mt = hl.vds.to_dense_mt(hl.vds.VariantDataset(vds.reference_data, mt))
The resulting mt has the following format:
+----------------+------------+------+-----------------------+------+-------+
| locus | alleles | rsid | s | GT | GQ |
+----------------+------------+------+-----------------------+------+-------+
| locus<GRCh38> | array<str> | str | str | call | int32 |
+----------------+------------+------+-----------------------+------+-------+
| chr1:100000 | ["G","T"] | NA | "SAMPLE" | 0/0 | 81 |
| AD | DP | LGT | LAD | LA |
+--------------+-------+------+--------------+--------------+
| array<int32> | int32 | call | array<int32> | array<int32> |
+--------------+-------+------+--------------+--------------+
| NA | 41 | 0/0 | [41] | NA |
The issue I’m having is that LA is a NA when the sample is hom-ref, and therefore AD is NA as well. What would be the easiest way to fix it?
I’ve tried the following snippet, with no luck:
# Annotate AD: If it's missing, create a array of length two with LAD, and DP - LAD
mt=mt.annotate_entries(AD=hl.or_else(mt.AD,hl.array([mt.LAD[0],mt.DP-mt.LAD[0]])))
Thanks!