Error - Hail version: 0.2.132-678e1f52b999
Error summary: GoogleJsonResponseException: 400 Bad Request
GET https://storage.googleapis.com/storage/v1/b/ukbb-exome-public/o/500k%2Fresults%2Fvariant_results.mt%2Fmetadata.json.gz?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
{
“code” : 400,
“errors” : [ {
“domain” : “global”,
“message” : “Bucket is a requester pays bucket but no user project provided.”,
“reason” : “required”
} ],
“message” : “Bucket is a requester pays bucket but no user project provided.”
}
Run command -
spark-submit --packages com.google.cloud.bigdataoss:gcs-connector:hadoop2-2.2.5 --conf spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem --conf spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS --conf spark.hadoop.fs.gs.auth.service.account.enable=false --conf spark.hadoop.fs.gs.auth.null.enable=true --conf spark.hadoop.fs.gs.requester.pays.enable=true --conf spark.hadoop.fs.gs.requester.pays.billing.project.id=human-genetics-001 my-hail-script.py
my-hail-script.py
import hail as hl
hl.init()
mt = hl.read_matrix_table(“gs://ukbb-exome-public/500k/results/variant_results.mt”)
gene_mt = hl.read_matrix_table(“gs://ukbb-exome-public/500k/results/results.mt”)
Also tried the script below -
import hail as hl
hl.init(
spark_conf={
# 1) Make sure your GCS connector is on Spark’s classpath
# (we’ll assume you put gcs-connector-hadoop2.jar
# in /cluster/spark/jars so no --jars needed here)
# 2) Register the GCS FileSystem implementations:
"spark.hadoop.fs.gs.impl":
"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem",
"spark.hadoop.fs.AbstractFileSystem.gs.impl":
"com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS",
'spark.hadoop.fs.gs.requester.pays.mode': 'AUTO',
'spark.hadoop.fs.gs.requester.pays.buckets': 'ukbb-exome-public',
'spark.hadoop.fs.gs.requester.pays.project.id': 'human-genetics-001' ,
# 3) Disable every authentication mechanism except anonymous:
"spark.hadoop.fs.gs.auth.service.account.enable": "false",
"spark.hadoop.fs.gs.auth.null.enable": "true",
# (If you have any other fs.gs.auth.* keys set, remove them
# or set them to false/null here.)
}
)
mt = hl.read_matrix_table(“gs://ukbb-exome-public/500k/results/variant_results.mt”)
gene_mt = hl.read_matrix_table(“gs://ukbb-exome-public/500k/results/results.mt”)