Keep v chr:pos:ref:alt format in variants_table().to_pandas()

I’ve tried vds.variants_table().to_pandas(expand=False) but it still splits v.

v va.rsid va.qual va.filters va.pass
0 (1, 904165, G, [(G, A)]) . 52346.37 [] False
1 (1, 909917, G, [(G, A)]) . 1576.94 [] False
2 (1, 986963, C, [(C, T)]) . 398.06 [] False
3 (1, 1563691, T, [(T, G)]) . 1090.75 [] False

Is there a way to get v as chr:start:ref:alt in the dataframe without having to reconstruct it?
As this is the method of choice to uniquely identify a variant, it would make sense to make it available to pandas (as it is to export_variants()).

to_pandas uses Spark dataframes as an intermediate, which don’t know about Hail specific objects (Variant, Genotype, etc). We’ll write a direct converter at some point which can keep these objects and which can probably go way faster. For now, map it to a str first:

vds.variants_table().annotate('v = str(v)').to_pandas(expand=False)

1 Like