Hi,
We need to know the size of each shard before creating an index in ES. We have MatrixTable that we convert to Table, and then write to Elasticsearch:
mt = mt.rows()
# Converts nested structs into one field, e.g. {a: {b: 1}} => a.b: 1
table = mt.drop('vep').flatten()
# When flattening, the table is unkeyed, which causes problems because our locus and alleles should not be normal fields.
table = table.drop(table.locus, table.alleles)
hl.export_elasticsearch(table, ...)
Is there a way to figure it out? We are using AWS EMR, so I suppose we can issue query to know the parameters of the ES cluster or, in the worst case, just supply it to our python Hail script. But still I am not sure how to correctly compute it and whether it is possible at all.