Chaining commands in one expression

Hi Guys,

It might be written somewhere, but I can’t find a way to chain commands together like it is quite intuitive in spark. Eg I couldn’t simplify the following expression:

ot_variants = hl.Table.from_spark(included_df)
ot_variants = (
    ot_variants.annotate(pos = hl.int32(ot_variants.pos))
)
ot_variants = (
    ot_variants
    .annotate(
        locus = hl.locus(
            ot_variants.chrom, 
            ot_variants.pos, 
            reference_genome='GRCh38'
        ),
        alleles = hl.array([ot_variants.ref, ot_variants.alt])
    )
)
ot_variants = (
    ot_variants
    .key_by(ot_variants.locus, ot_variants.alleles)
    .drop(*['chrom', 'pos', 'alt', 'ref'])
)

So my question if there’s a way to write something like this:

ot_variants = (
    hl.Table.from_spark(included_df)
    .annotate(
        pos = hl.int32(ot_variants.pos),
        locus = hl.locus(
            ot_variants.chrom, 
            ot_variants.pos, 
            reference_genome='GRCh38'
        ),
        alleles = hl.array([ot_variants.ref, ot_variants.alt])
    )
    .key_by(ot_variants.locus, ot_variants.alleles)
    .drop(*['chrom', 'pos', 'alt', 'ref'])
)

Thank you so much!

The chaining you’re looking for doesn’t really exist. Every operation that creates a new table “breaks the chain”, since to refer to fields of that new table, you need to have an identifier that refers to that table.

1 Like

OK, I see. So there’s no way to reference columns on the fly, similar achieved by the col() function in pyspark.