Hello,
Since a VDS file is based on the Parquet file format, I was wondering whether anyone has experience working with VDS files within an R environment using the sparklyr package. Also, are there ongoing or planned efforts within the Hail community to build an R API?
Thank you,
Hi Jeroen,
There are people like @konradjk who have some experience using Hail + SparklyR, but they’ve mostly been using it to work with key tables (very similar to spark dataframes) rather than VDS objects. Konrad’s Python prep went something like this:
vds.variants_keytable().to_dataframe().write.parquet('file.parquet')
Then used SparklyR to read + process the file.
We would love to have a full R API, but don’t have any plans to build one right now.
Yep, this works reasonably well, but last I looked, some datatypes did not transfer over swimmingly: I think at the time set
s would break the load in R, but if you drop them/select only the columns you need, it works fine.