Import data from dataframe parquet into vds


Is there a way to import data stored in a dataframe parquet into a HAIL format? For example I have a df with chrom, pos, ref, alt, qual, filter, rsid, and a array samples column with each sample struct containing sampleId, GT, DP, prob_hom_ref, prob_het, and prob_hom_alt all as separate columns. The whole df are about 1 million variants and 10k samples.

Thanks so much!

you can get into a Hail table with hl.Table.from_spark. To go to a MatrixTable is a bit harder – I’ll think that over. We might need a new function.

are the arrays all in the same order?

Thanks I’ll look at that function.

The array of sample info COULD be in the same order sorting the array by the sampleId field in each sample struct