Working with VDS files using the sparklyr R package + plans for R API?

Hello,

Since a VDS file is based on the Parquet file format, I was wondering whether anyone has experience working with VDS files within an R environment using the sparklyr package. Also, are there ongoing or planned efforts within the Hail community to build an R API?

Thank you,

  • Jeroen

Hi Jeroen,
There are people like @konradjk who have some experience using Hail + SparklyR, but they’ve mostly been using it to work with key tables (very similar to spark dataframes) rather than VDS objects. Konrad’s Python prep went something like this:

vds.variants_keytable().to_dataframe().write.parquet('file.parquet')

Then used SparklyR to read + process the file.

We would love to have a full R API, but don’t have any plans to build one right now.

Yep, this works reasonably well, but last I looked, some datatypes did not transfer over swimmingly: I think at the time sets would break the load in R, but if you drop them/select only the columns you need, it works fine.