cc @tpoterba
We need online median calculation in Hail. Currently, the closest thing you can get to a median is to calculate a histogram of your data using hist
and then determine the bin in which the median resides (by finding the weighted median of the bins).
It would be nice if there was an Aggregable[Numeric].median()
aggregator. What are some strategies for doing this with RDDs?