We need to find compute power(amount of ram and cores used) for hail tasks executed.
- hail tasks running in parallel make it difficult to find computational cost(we are using a workflow manager - luigi on top of it
- task information is lost at spark interface i.e. spark job, stages and executor UI do not contain information about which hail task was executed
Slurmprovides info about computation but at macro level. It won’t work in case of parallel tasks
Sparklogs don’t contains information about computation cost. They only give status for success and failure
Pysparkpython library doesn’t provide information about amount of ram or cores used. It only displayes success or failure for a task.
SparkREST api(Monitoring and Instrumentation - Spark 3.1.2 Documentation) provides various information about executor task time etc. Also, we need to map spark job to hail task.
PS: I have seen this thread Link but it discusses the topic in terms of money