Problem with hl.import_vcf from google bucket


#1

I apologize in advance for this probably trivial question as a Google Cloud newby, but I have the following question regarding the hl.import_vcf function as I have difficulties reading this data from a google bucket that I have access to.

I have a large vcf file usaupn.vcf that is residing in a google bucket gs://mybucket/usa.vcf with my bucket referring to my specific google bucket.

When I use gsutil, I can read the file, indicating that I have the right permissions to do this:
gsutil cat gs://mybucket/usa.vcf | head -10

However, when I use hl.import_vcf in the same session I get an error message, indicating that hail does not have access to the google bucket:
hl.import_vcf(gs://mybucket/usa.vcf).write('usaupn.mt', overwrite=True)

---------------------------------------------------------------------------
FatalError                                Traceback (most recent call last)
<ipython-input-5-86ef63ded98f> in <module>()
----> 1 hl.import_vcf(usaupn_vcf).write('usaupn.mt', overwrite=True)

<decorator-gen-1112> in import_vcf(path, force, force_bgz, header_file, min_partitions, drop_samples, call_fields, reference_genome, contig_recoding, array_elements_required, skip_invalid_loci, _partitions)

/home/hail/hail.zip/hail/typecheck/check.py in wrapper(__original_func, *args, **kwargs)
    558     def wrapper(__original_func, *args, **kwargs):
    559         args_, kwargs_ = check_all(__original_func, args, kwargs, checkers, is_method=is_method)
--> 560         return __original_func(*args_, **kwargs_)
    561 
    562     return wrapper

/home/hail/hail.zip/hail/methods/impex.py in import_vcf(path, force, force_bgz, header_file, min_partitions, drop_samples, call_fields, reference_genome, contig_recoding, array_elements_required, skip_invalid_loci, _partitions)
   1886                              reference_genome, contig_recoding, array_elements_required,
   1887                              skip_invalid_loci, force_bgz, force, _partitions)
-> 1888     return MatrixTable(MatrixRead(reader, drop_cols=drop_samples))
   1889 
   1890 @typecheck(path=sequenceof(str),

/home/hail/hail.zip/hail/matrixtable.py in __init__(self, mir)
    551         self._mir = mir
    552         self._jmt = Env.hail().variant.MatrixTable(
--> 553             Env.hc()._jhc, Env.hc()._backend._to_java_ir(self._mir))
    554 
    555         self._globals = None

/home/hail/hail.zip/hail/backend/backend.py in _to_java_ir(self, ir)
     30             code = r(ir)
     31             # FIXME parse should be static
---> 32             ir._jir = ir.parse(code, ir_map=r.jirs)
     33         return ir._jir
     34 

/home/hail/hail.zip/hail/ir/base_ir.py in parse(self, code, ref_map, ir_map)
     94 
     95     def parse(self, code, ref_map={}, ir_map={}):
---> 96         return Env.hail().expr.ir.IRParser.parse_matrix_ir(code, ref_map, ir_map)

/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

/home/hail/hail.zip/hail/utils/java.py in deco(*args, **kwargs)
    222             raise FatalError('%s\n\nJava stack trace:\n%s\n'
    223                              'Hail version: %s\n'
--> 224                              'Error summary: %s' % (deepest, full, hail.__version__, deepest)) from None
    225         except pyspark.sql.utils.CapturedException as e:
    226             raise FatalError('%s\n\nJava stack trace:\n%s\n'

FatalError: GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "730607661215-compute@developer.gserviceaccount.com does not have storage.objects.get access to fc-b88f9de6-4375-41df-a845-17bd780f87ab/8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf.",
    "reason" : "forbidden"
  } ],
  "message" : "730607661215-compute@developer.gserviceaccount.com does not have storage.objects.get access to fc-b88f9de6-4375-41df-a845-17bd780f87ab/8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf."
}

Java stack trace:
org.json4s.package$MappingException: unknown error
	at org.json4s.Extraction$.extract(Extraction.scala:46)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:995)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:932)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1062)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1077)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:995)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:932)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1062)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1077)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

java.io.IOException: Error accessing: bucket: fc-b88f9de6-4375-41df-a845-17bd780f87ab, object: 8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1892)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1919)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1804)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1181)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1537)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
	at org.apache.hadoop.fs.Globber.doGlob(Globber.java:269)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1705)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1706)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1630)
	at is.hail.utils.richUtils.RichHadoopConfiguration$.glob$extension(RichHadoopConfiguration.scala:129)
	at is.hail.utils.richUtils.RichHadoopConfiguration$$anonfun$globAll$extension$1.apply(RichHadoopConfiguration.scala:108)
	at is.hail.utils.richUtils.RichHadoopConfiguration$$anonfun$globAll$extension$1.apply(RichHadoopConfiguration.scala:107)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
	at is.hail.utils.richUtils.RichHadoopConfiguration$.globAll$extension(RichHadoopConfiguration.scala:113)
	at is.hail.io.vcf.MatrixVCFReader.<init>(LoadVCF.scala:981)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:995)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:932)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1062)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1077)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)

com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "730607661215-compute@developer.gserviceaccount.com does not have storage.objects.get access to fc-b88f9de6-4375-41df-a845-17bd780f87ab/8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf.",
    "reason" : "forbidden"
  } ],
  "message" : "730607661215-compute@developer.gserviceaccount.com does not have storage.objects.get access to fc-b88f9de6-4375-41df-a845-17bd780f87ab/8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1056)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1913)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1804)
	at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1181)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1537)
	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
	at org.apache.hadoop.fs.Globber.doGlob(Globber.java:269)
	at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1705)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1706)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.globStatus(GoogleHadoopFileSystemBase.java:1630)
	at is.hail.utils.richUtils.RichHadoopConfiguration$.glob$extension(RichHadoopConfiguration.scala:129)
	at is.hail.utils.richUtils.RichHadoopConfiguration$$anonfun$globAll$extension$1.apply(RichHadoopConfiguration.scala:108)
	at is.hail.utils.richUtils.RichHadoopConfiguration$$anonfun$globAll$extension$1.apply(RichHadoopConfiguration.scala:107)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
	at scala.collection.AbstractIterator.to(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
	at is.hail.utils.richUtils.RichHadoopConfiguration$.globAll$extension(RichHadoopConfiguration.scala:113)
	at is.hail.io.vcf.MatrixVCFReader.<init>(LoadVCF.scala:981)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$instantiate(Extraction.scala:490)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:515)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$ClassInstanceBuilder.org$json4s$Extraction$ClassInstanceBuilder$$mkWithTypeHint(Extraction.scala:507)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:514)
	at org.json4s.Extraction$ClassInstanceBuilder$$anonfun$result$6.apply(Extraction.scala:512)
	at org.json4s.Extraction$.org$json4s$Extraction$$customOrElse(Extraction.scala:524)
	at org.json4s.Extraction$ClassInstanceBuilder.result(Extraction.scala:512)
	at org.json4s.Extraction$.extract(Extraction.scala:351)
	at org.json4s.Extraction$.extract(Extraction.scala:42)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
	at org.json4s.jackson.Serialization$.read(Serialization.scala:50)
	at is.hail.expr.ir.IRParser$.matrix_ir_1(Parser.scala:995)
	at is.hail.expr.ir.IRParser$.matrix_ir(Parser.scala:932)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$$anonfun$parse_matrix_ir$2.apply(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse(Parser.scala:1062)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1078)
	at is.hail.expr.ir.IRParser$.parse_matrix_ir(Parser.scala:1077)
	at is.hail.expr.ir.IRParser.parse_matrix_ir(Parser.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)


Hail version: 0.2.7-e08cc2a17c4a
Error summary: GoogleJsonResponseException: 403 Forbidden
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "730607661215-compute@developer.gserviceaccount.com does not have storage.objects.get access to fc-b88f9de6-4375-41df-a845-17bd780f87ab/8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf.",
    "reason" : "forbidden"
  } ],
  "message" : "730607661215-compute@developer.gserviceaccount.com does not have storage.objects.get access to fc-b88f9de6-4375-41df-a845-17bd780f87ab/8d1e6866-6413-4a0a-9e40-85e1e070b037/test/91281ee9-b993-413f-8005-aae6d3e68826/call-head/usaupn.vcf."
}

#2

Hi @ihelbig,

Use it this way.

mt = hl.import_vcf('gs://mybucket/usa.vcf',force_bgz=True, min_partitions=10000).write('gs://mybucket/usaupn.mt', overwrite=True)


#3

The problem is likely related to Google Dataproc service accounts. Even if your Google account has access, you need to make sure that your Dataproc service account does.

See some info here:
https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/service-accounts


#4

Thank you! The force_bgz option and the min_partitions didn’t solve this issue, which was due to permissions (see below my response to Tim’s suggestion). Either way, by briefly copying the object into the current project, I could solve this and it worked afterwards.


#5

Thank you so much for your help and feedback. I solved the issue by copying the object into my current project and loading it from there. Luckily, copying between buckets is fast and I could remove this after creating the MatrixTable.

With regards to the dataproc service accounts, I could load this when running this within the Cloudshell from the master node, so this was probably not the reason.

I used a Jupyter notebook on top of Hail, so I don’t know whether this may have contributed to this. Either way, loading directly from the project solved this.


#6

The key issue for import_vcf is the service account on the worker nodes. It is also possible that the cloud shell / ssh loads your user credentials for use with gsutil. You can verify who you are acting as with gcloud auth list.