Hailctl dataproc submit fails with 'no user project provided'

MattBrauer · March 5, 2024, 11:06pm

I have been running a cluster (‘hail-genebass’) on dataproc and have success with a Jupyter notebook. However, I am having trouble translating that code to a script to be submitted to the cluster via hailctl dataproc submit.

The notebook code that works is:
import hail as hl
hl.init(spark_conf={
'spark.hadoop.fs.gs.requester.pays.mode': 'CUSTOM',
'spark.hadoop.fs.gs.requester.pays.buckets': 'ukbb-exome-public',
'spark.hadoop.fs.gs.requester.pays.project.id': 'human-genetics-001'
})
mt = hl.read_matrix_table('gs://ukbb-exome-public/500k/results/variant_results.mt')

However, placing this code into the filetest.py file and submitting via
hailctl dataproc submit hail-genebass test.py

leads to the error:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Bucket is a requester pays bucket but no user project provided.",
"reason" : "required"
} ],
"message" : "Bucket is a requester pays bucket but no user project provided."
}

Given that I started (and can execute the notebook code on) the cluster with the project and requester-pays-allow-buckets identified, I’m not sure why this submit job should fail. Do I need to re-specify the project id and allowed buckets in the submit command and if so, how?

Thanks.

danking · March 6, 2024, 5:51pm

Possibly related to recent changes to the Hadoop GCS connector. hadoop-connectors/gcs/INSTALL.md at v3.0.0 · GoogleCloudDataproc/hadoop-connectors · GitHub What version of Hail and Dataproc do you have?

MattBrauer · March 6, 2024, 6:27pm

Hi Dan. Hail version 0.2.128-eead8100a1c1, dataproc looks like dataproc-2-1-deb11-20231128-155100-rc01.

danking · March 9, 2024, 8:43pm

@MattBrauer you probably need to specify those Spark configuration parameters as --properties because of the way Spark is initialized in submission. In particular I think submission invokes pyspark which starts a Spark session for you before your Python code is executed.

Topic		Replies	Views
Error: "Bucket is a requester pays bucket but no user project provided." Hail Query & hailctl	10	678	October 17, 2022
I'm encountering "Bucket is a requester pays bucket but no user project provided." Hail Query & hailctl	11	2423	December 22, 2022
Using Hail on the Google Cloud Platform Help [0.1]	18	14031	September 14, 2017
Hail cluster creation error Hail Query & hailctl	18	1950	May 14, 2020
I cannot start a Hail cluster when using the optional-components Dataproc flag Hail Query & hailctl	4	727	June 25, 2020

Hailctl dataproc submit fails with 'no user project provided'

Related topics