`filter_entries` introduces NA instead of removing entries from a matrixtable

I’m trying to filter out variants with AF < 0.8 from a matrixtable. I can of course use hl.variant_qc with mt.filter_rows to get this done but my VCF has AF fields with values 0, 0.5 and 1.0 which is not helpful for a fine filtering.

So I created a filter condition using AD and DP values in the entries field to filter out variants.

mt = mt.filter_entries(mt.AD[1]/mt.DP > 0.8)

However, when I see the outputs I see NA has been assigned where the filter condition evaluated to False. See below,

mt.AD.show()

+---------------+------------+--------------+
| locus         | alleles    | 'test'.AD   |
+---------------+------------+--------------+
| locus<GRCh38> | array<str> | array<int32> |
+---------------+------------+--------------+
| chrX:22849    | ["A","G"]  | [0,22]       |
| chrX:26601    | ["G","T"]  | NA           |
| chrX:26883    | ["C","T"]  | [0,89]       |
| chrX:26987    | ["A","C"]  | NA           |
| chrX:27266    | ["C","G"]  | NA           |
+---------------+------------+--------------+

I’m not sure why I see NAs when I expect the variants to be removed from mt.

I also tried to export mt as a VCF and I still see variants with AF<0.8 but the sample field is replaced with ./.

I basically want to get rid of variants which fail for my filtering condition (mt.AD[1]/mt.DP > 0.8) but I’m not sure what I’m missing in my implementation.

Any thoughts would be appreciated!

Faizal

You’ll need to use filter_rows instead of filter_entries to remove rows (variants) instead of FORMAT field groups (entries).

If you only have a single column/sample, I understand why this is confusing, and matrixtable is probably not the right interface for you to use. If you do:

ht = mt.make_table()

You can filter your table with ht.filter(ht['test.AD'] / ht['test.DP'] > 0.8)

Cool, this worked. Appreciate the clarification!

Follow up issue:
After filtering the table and exporting it as a VCF file, I see the output VCF doesn’t have the FORMAT field. I understand hail doesn’t add FORMAT field while exporting from a hail table. But is there a work-around to add the FORMAT field to the output VCF?

Appreciate your help!

Faizal

The interface is pretty rough here but I’d construct a single field, sample1, that is an hl.struct with all the genotype fields then use Hail | Table to convert back to MT and then use standard MT export.

1 Like