Dear Hail community,
I am trying to first count for the number of experiments with homozygous reference in each loci and then filter out those loci with homozygous reference in all the experiments.
So, Is tart loading the GVCF and densify it.
sparseMatrix = hl.read_matrix_table( '{0}/chrom-{1}'.format( gvcf_store_path, chrom ) )
denseMatrix = hl.experimental.densify( familyMatrix )
So, running denseMatrix.LGT.show( 5 )
I get this:
locus E012877.LGT E012882.LGT
locus<GRCh37> call call
20:1 0/0 0/0
20:60001 0/0 0/0
20:60014 0/0 0/0
20:60019 0/0 0/0
20:60022 0/0 0/0
Then I identify the homozygous reference with denseMatrix.LGT.is_hom_ref().show( 5 )
:
locus E012877. E012882.
locus<GRCh37> bool bool
20:1 true true
20:60001 true true
20:60014 true true
20:60019 true true
20:60022 true true
But I do not see how to compute the sum of rows as an annotation:
denseMatrix.annotate_rows( nH = hl.eval( denseMatrix.LGTis_hom_ref() ) )
Nor how to perform the filtering:
denseMatrix = denseMatrix[ denseMatrix.nH( denseMatrix.nH == denseMatrix.count_cols) ]
Any suggestion is welcome.
Thanks in advance,
~Carles