Figure out shard size from size of MatrixTable that will be written to Elasticsearch

NLSVTN · June 17, 2021, 6:31pm

Hi,

We need to know the size of each shard before creating an index in ES. We have MatrixTable that we convert to Table, and then write to Elasticsearch:

mt = mt.rows()
# Converts nested structs into one field, e.g. {a: {b: 1}} => a.b: 1
table = mt.drop('vep').flatten()
# When flattening, the table is unkeyed, which causes problems because our locus and alleles should not be normal fields. 
table = table.drop(table.locus, table.alleles)

hl.export_elasticsearch(table, ...)

Is there a way to figure it out? We are using AWS EMR, so I suppose we can issue query to know the parameters of the ES cluster or, in the worst case, just supply it to our python Hail script. But still I am not sure how to correctly compute it and whether it is possible at all.

Topic		Replies	Views
MatrixTable to Elastic Hail Query & hailctl	5	421	October 29, 2020
Updating index gives number of documents in the index cannot exceed 2147483519 Hail Query & hailctl	5	2001	May 21, 2021
Can not export to elasticsearch database, don't see any error log Hail Query & hailctl	9	753	December 13, 2019
Could not able to export the data to ElasticSearch Hail Query & hailctl	25	5124	March 14, 2019
Export_elasticsearch function documentation for updating behavior Hail Query & hailctl	3	502	February 23, 2021

Figure out shard size from size of MatrixTable that will be written to Elasticsearch

Related topics