Ukbb_common documentation seems stale

Is the documentation here Hail Format | Pan UKBB susceptible of improvement?

Some of the python code is installable via pip, but is it the right code? Should that doc direct the user to installation via pip? The repos themselves have limited README/install doc.

We don’t operate or control the PanUKB website but @konradjk does :wink:

There is always room for improvement! This is an active project where code is being developed as we go - the pip installation is the right code, though may be a little bit behind the Github at times. Do you have a specific question on the code?

Thanks for getting back to me. I don’t have any questions yet, just want to be sure I am using the endorsed code base. The doc did not get into pip availability.

Here’s an issue – this is in a terra spark single node. are special permissions needed for LD matrix access?

>>> hl.init(spark_conf={
...     'spark.hadoop.fs.gs.requester.pays.mode': 'CUSTOM',
...     'spark.hadoop.fs.gs.requester.pays.buckets': 'ukb-diverse-pops-public',
...     'spark.hadoop.fs.gs.requester.pays.project.id': 'landmarkanvil2'
... })
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.3
SparkUI available at http://saturn-f2057dd6-e9fe-4c9f-8b7f-ea96803bafc7-m.c.terra-c9c997fd.internal:45693
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.105-acd89e80c345
LOGGING: writing to /home/jupyter/hail-20221224-0359-0.2.105-acd89e80c345.log
>>> import hail.linalg as hli
>>> import ukbb_pan_ancestry as upa
>>> xx = hli.BlockMatrix.read(upa.get_ld_matrix_path('AFR'))
>>> xx.shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/linalg/blockmatrix.py", line 578, in shape
    return tensor_shape_to_matrix_shape(self._bmir)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/ir/blockmatrix_ir.py", line 421, in tensor_shape_to_matrix_shape
    shape = bmir.typ.shape
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/ir/base_ir.py", line 525, in typ
    self.compute_type(deep_typecheck=False)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/ir/base_ir.py", line 516, in compute_type
    computed = self._compute_type(deep_typecheck)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/ir/blockmatrix_ir.py", line 27, in _compute_type
    return Env.backend().blockmatrix_type(self)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 320, in blockmatrix_type
    jir = self._to_java_blockmatrix_ir(bmir)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 289, in _to_java_blockmatrix_ir
    return self._to_java_ir(ir, self._parse_blockmatrix_ir)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 276, in _to_java_ir
    ir._jir = parse(r(finalize_randomness(ir)), ir_map=r.jirs)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 257, in _parse_blockmatrix_ir
    return self._jbackend.parse_blockmatrix_ir(code, ir_map)
  File "/home/jupyter/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 31, in deco
    raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: GoogleJsonResponseException: 403 Forbidden
GET https://storage.googleapis.com/storage/v1/b/ukb-diverse-pops/o/ld%2FAFR%2FUKBB.AFR.ldadj.bm%2Fmetadata.json?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "pet-101835398722999344661@terra-c9c997fd.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
    "reason" : "forbidden"
  } ],
  "message" : "pet-101835398722999344661@terra-c9c997fd.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."
}

And more

Python 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:49:35)
[GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ukbb_pan_ancestry import *
>>> import hail as hl
>>> hl.init(spark_conf={'spark.hadoop.fs.gs.requester.pays.mode': 'AUTO',
...                     'spark.hadoop.fs.gs.requester.pays.project.id': 'landmarkanvil2'})
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.3
SparkUI available at http://saturn-f2057dd6-e9fe-4c9f-8b7f-ea96803bafc7-m.c.terra-c9c997fd.internal:43519
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.105-acd89e80c345
LOGGING: writing to /home/jupyter/BiocHail/vignettes/hail-20221224-1357-0.2.105-acd89e80c345.log
>>> mt = load_final_sumstats_mt()
>>> mt.describe()
----------------------------------------
Global fields:
    None
----------------------------------------
Column fields:
    'trait_type': str
    'phenocode': str
    'pheno_sex': str
    'coding': str
    'modifier': str
    'pheno_data': struct {
        n_cases: int32,
        n_controls: int32,
        heritability: float64,
        saige_version: str,
        inv_normalized: bool,
        pop: str,
        lambda_gc: float64,
        n_variants: int64,
        n_sig_variants: int64
    }
    'description': str
    'description_more': str
    'coding_description': str
    'category': str
    'n_cases_full_cohort_both_sexes': int64
    'n_cases_full_cohort_females': int64
    'n_cases_full_cohort_males': int64
----------------------------------------
Row fields:
    'locus': locus<GRCh37>
    'alleles': array<str>
    'rsid': str
    'varid': str
    'vep': struct {
        assembly_name: str,
        allele_string: str,
        ancestral: str,
        colocated_variants: array<struct {
            aa_allele: str,
            aa_maf: float64,
            afr_allele: str,
            afr_maf: float64,
            allele_string: str,
            amr_allele: str,
            amr_maf: float64,
            clin_sig: array<str>,
            end: int32,
            eas_allele: str,
            eas_maf: float64,
            ea_allele: str,
            ea_maf: float64,
            eur_allele: str,
            eur_maf: float64,
            exac_adj_allele: str,
            exac_adj_maf: float64,
            exac_allele: str,
            exac_afr_allele: str,
            exac_afr_maf: float64,
            exac_amr_allele: str,
            exac_amr_maf: float64,
            exac_eas_allele: str,
            exac_eas_maf: float64,
            exac_fin_allele: str,
            exac_fin_maf: float64,
            exac_maf: float64,
            exac_nfe_allele: str,
            exac_nfe_maf: float64,
            exac_oth_allele: str,
            exac_oth_maf: float64,
            exac_sas_allele: str,
            exac_sas_maf: float64,
            id: str,
            minor_allele: str,
            minor_allele_freq: float64,
            phenotype_or_disease: int32,
            pubmed: array<int32>,
            sas_allele: str,
            sas_maf: float64,
            somatic: int32,
            start: int32,
            strand: int32
        }>,
        context: str,
        end: int32,
        id: str,
        input: str,
        intergenic_consequences: array<struct {
            allele_num: int32,
            consequence_terms: array<str>,
            impact: str,
            minimised: int32,
            variant_allele: str
        }>,
        most_severe_consequence: str,
        motif_feature_consequences: array<struct {
            allele_num: int32,
            consequence_terms: array<str>,
            high_inf_pos: str,
            impact: str,
            minimised: int32,
            motif_feature_id: str,
            motif_name: str,
            motif_pos: int32,
            motif_score_change: float64,
            strand: int32,
            variant_allele: str
        }>,
        regulatory_feature_consequences: array<struct {
            allele_num: int32,
            biotype: str,
            consequence_terms: array<str>,
            impact: str,
            minimised: int32,
            regulatory_feature_id: str,
            variant_allele: str
        }>,
        seq_region_name: str,
        start: int32,
        strand: int32,
        transcript_consequences: array<struct {
            allele_num: int32,
            amino_acids: str,
            biotype: str,
            canonical: int32,
            ccds: str,
            cdna_start: int32,
            cdna_end: int32,
            cds_end: int32,
            cds_start: int32,
            codons: str,
            consequence_terms: array<str>,
            distance: int32,
            domains: array<struct {
                db: str,
                name: str
            }>,
            exon: str,
            gene_id: str,
            gene_pheno: int32,
            gene_symbol: str,
            gene_symbol_source: str,
            hgnc_id: str,
            hgvsc: str,
            hgvsp: str,
            hgvs_offset: int32,
            impact: str,
            intron: str,
            lof: str,
            lof_flags: str,
            lof_filter: str,
            lof_info: str,
            minimised: int32,
            polyphen_prediction: str,
            polyphen_score: float64,
            protein_end: int32,
            protein_start: int32,
            protein_id: str,
            sift_prediction: str,
            sift_score: float64,
            strand: int32,
            swissprot: str,
            transcript_id: str,
            trembl: str,
            uniparc: str,
            variant_allele: str
        }>,
        variant_class: str
    }
    'freq': array<struct {
        pop: str,
        ac: float64,
        af: float64,
        an: int64,
        gnomad_exomes_ac: int32,
        gnomad_exomes_af: float64,
        gnomad_exomes_an: int32,
        gnomad_genomes_ac: int32,
        gnomad_genomes_af: float64,
        gnomad_genomes_an: int32
    }>
    'pass_gnomad_exomes': bool
    'pass_gnomad_genomes': bool
    'n_passing_populations': int32
    'high_quality': bool
    'nearest_genes': array<struct {
        gene_id: str,
        gene_name: str,
        within_gene: bool
    }>
    'info': float64
----------------------------------------
Entry fields:
    'summary_stats': struct {
        AF_Allele2: float64,
        imputationInfo: float64,
        BETA: float64,
        SE: float64,
        `p.value.NA`: float64,
        `AF.Cases`: float64,
        `AF.Controls`: float64,
        Pvalue: float64,
        low_confidence: bool
    }
----------------------------------------
Column key: ['trait_type', 'phenocode', 'pheno_sex', 'coding', 'modifier']
Row key: ['locus', 'alleles']
----------------------------------------
>>> phenotype_ht = mt.cols()
2022-12-24 13:58:45.606 Hail: WARN: cols(): Resulting column table is sorted by 'col_key'.
    To preserve matrix table column order, first unkey columns with 'key_cols_by()'
>>> phenotype_ht.show(truncate=40, width=85)
2022-12-24 13:59:00.802 Hail: INFO: Coerced sorted dataset==========================================>                     (13 + 2) / 16]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-1064>", line 2, in show
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 1702, in show
    return handler(self._show(n, width, truncate, types))
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 1489, in __str__
    return self._ascii_str()
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 1515, in _ascii_str
    rows, has_more, dtype = self.data()
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 1499, in data
    rows, has_more = t._take_n(self.n)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 1646, in _take_n
    rows = self.take(n + 1)
  File "<decorator-gen-1076>", line 2, in take
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 2319, in take
    return self.head(n).collect(_localize)
  File "<decorator-gen-1070>", line 2, in collect
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/typecheck/check.py", line 577, in wrapper
    return __original_func(*args_, **kwargs_)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/table.py", line 2118, in collect
    return Env.backend().execute(e._ir, timed=_timed)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 104, in execute
    self._handle_fatal_error_from_backend(e, ir)
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/backend.py", line 181, in _handle_fatal_error_from_backend
    raise err
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 98, in execute
    result_tuple = self._jbackend.executeEncode(jir, stream_codec, timed)
  File "/home/jupyter/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/home/jupyter/.local/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 31, in deco
    raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: AssertionError: assertion failed: ptype mismatch:
  upcast: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[1:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}
  computed: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[0:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}

Java stack trace:
java.lang.AssertionError: assertion failed: ptype mismatch:
  upcast: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[1:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}
  computed: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[0:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}
        at scala.Predef$.assert(Predef.scala:223)
        at is.hail.expr.ir.PartitionRVDReader.emitStream(TableIR.scala:566)
        at is.hail.expr.ir.streams.EmitStream$.produce(EmitStream.scala:2779)
        at is.hail.expr.ir.streams.EmitStream$.produce$1(EmitStream.scala:148)
        at is.hail.expr.ir.streams.EmitStream$.produce(EmitStream.scala:990)
        at is.hail.expr.ir.streams.EmitStream$.produce$1(EmitStream.scala:148)
        at is.hail.expr.ir.streams.EmitStream$.produce(EmitStream.scala:798)
        at is.hail.expr.ir.Emit.emitStream$2(Emit.scala:805)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:1334)
        at is.hail.expr.ir.Emit.$anonfun$emitSplitMethod$1(Emit.scala:591)
        at is.hail.expr.ir.Emit.$anonfun$emitSplitMethod$1$adapted(Emit.scala:589)
        at is.hail.expr.ir.EmitCodeBuilder$.scoped(EmitCodeBuilder.scala:18)
        at is.hail.expr.ir.EmitCodeBuilder$.scopedVoid(EmitCodeBuilder.scala:28)
        at is.hail.expr.ir.EmitMethodBuilder.voidWithBuilder(EmitClassBuilder.scala:1007)
        at is.hail.expr.ir.Emit.emitSplitMethod(Emit.scala:589)
        at is.hail.expr.ir.Emit.emitInSeparateMethod(Emit.scala:606)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:793)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:786)
        at is.hail.expr.ir.Emit.$anonfun$emitI$241(Emit.scala:2386)
        at is.hail.expr.ir.EmitCode$.fromI(Emit.scala:445)
        at is.hail.expr.ir.Emit.$anonfun$emitI$240(Emit.scala:2386)
        at is.hail.expr.ir.EmitCodeBuilder$.scoped(EmitCodeBuilder.scala:18)
        at is.hail.expr.ir.EmitCodeBuilder$.scopedCode(EmitCodeBuilder.scala:23)
        at is.hail.expr.ir.EmitMethodBuilder.emitWithBuilder(EmitClassBuilder.scala:1005)
        at is.hail.expr.ir.WrappedEmitMethodBuilder.emitWithBuilder(EmitClassBuilder.scala:1058)
        at is.hail.expr.ir.WrappedEmitMethodBuilder.emitWithBuilder$(EmitClassBuilder.scala:1058)
        at is.hail.expr.ir.EmitFunctionBuilder.emitWithBuilder(EmitClassBuilder.scala:1074)
        at is.hail.expr.ir.Emit.$anonfun$emitI$238(Emit.scala:2361)
        at is.hail.expr.ir.IEmitCodeGen.map(Emit.scala:336)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:2341)
        at is.hail.expr.ir.Emit.emitI$3(Emit.scala:2555)
        at is.hail.expr.ir.Emit.$anonfun$emit$22(Emit.scala:2638)
        at is.hail.expr.ir.EmitCode$.fromI(Emit.scala:445)
        at is.hail.expr.ir.Emit.emit(Emit.scala:2637)
        at is.hail.expr.ir.Emit.emit$2(Emit.scala:2552)
        at is.hail.expr.ir.Emit.$anonfun$emit$6(Emit.scala:2578)
        at is.hail.expr.ir.EmitCode$.fromI(Emit.scala:445)
        at is.hail.expr.ir.Emit.emit(Emit.scala:2577)
        at is.hail.expr.ir.Emit.emitFallback$1(Emit.scala:811)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:2476)
        at is.hail.expr.ir.Emit.emitI$2(Emit.scala:799)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:2303)
        at is.hail.expr.ir.Emit.$anonfun$emitSplitMethod$1(Emit.scala:591)
        at is.hail.expr.ir.Emit.$anonfun$emitSplitMethod$1$adapted(Emit.scala:589)
        at is.hail.expr.ir.EmitCodeBuilder$.scoped(EmitCodeBuilder.scala:18)
        at is.hail.expr.ir.EmitCodeBuilder$.scopedVoid(EmitCodeBuilder.scala:28)
        at is.hail.expr.ir.EmitMethodBuilder.voidWithBuilder(EmitClassBuilder.scala:1007)
        at is.hail.expr.ir.Emit.emitSplitMethod(Emit.scala:589)
        at is.hail.expr.ir.Emit.emitInSeparateMethod(Emit.scala:606)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:793)
        at is.hail.expr.ir.streams.EmitStream$.is$hail$expr$ir$streams$EmitStream$$emit$1(EmitStream.scala:143)
        at is.hail.expr.ir.streams.EmitStream$.$anonfun$produce$3(EmitStream.scala:202)
        at is.hail.expr.ir.EmitCode$.fromI(Emit.scala:445)
        at is.hail.expr.ir.streams.EmitStream$.produce(EmitStream.scala:202)
        at is.hail.expr.ir.streams.EmitStream$.produce$1(EmitStream.scala:148)
        at is.hail.expr.ir.streams.EmitStream$.$anonfun$produce$4(EmitStream.scala:203)
        at is.hail.expr.ir.EmitCodeBuilder.withScopedMaybeStreamValue(EmitCodeBuilder.scala:182)
        at is.hail.expr.ir.streams.EmitStream$.produce(EmitStream.scala:202)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:2093)
        at is.hail.expr.ir.Emit.$anonfun$emitSplitMethod$1(Emit.scala:591)
        at is.hail.expr.ir.Emit.$anonfun$emitSplitMethod$1$adapted(Emit.scala:589)
        at is.hail.expr.ir.EmitCodeBuilder$.scoped(EmitCodeBuilder.scala:18)
        at is.hail.expr.ir.EmitCodeBuilder$.scopedVoid(EmitCodeBuilder.scala:28)
        at is.hail.expr.ir.EmitMethodBuilder.voidWithBuilder(EmitClassBuilder.scala:1007)
        at is.hail.expr.ir.Emit.emitSplitMethod(Emit.scala:589)
        at is.hail.expr.ir.Emit.emitInSeparateMethod(Emit.scala:606)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:793)
        at is.hail.expr.ir.Emit.emitInNewBuilder$1(Emit.scala:802)
        at is.hail.expr.ir.Emit.$anonfun$emitI$29(Emit.scala:950)
        at is.hail.expr.ir.EmitCode$.fromI(Emit.scala:445)
        at is.hail.expr.ir.Emit.$anonfun$emitI$28(Emit.scala:950)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at is.hail.expr.ir.Emit.emitI(Emit.scala:949)
        at is.hail.expr.ir.Emit$.$anonfun$apply$5(Emit.scala:78)
        at is.hail.expr.ir.EmitCodeBuilder$.scoped(EmitCodeBuilder.scala:18)
        at is.hail.expr.ir.EmitCodeBuilder$.scopedCode(EmitCodeBuilder.scala:23)
        at is.hail.expr.ir.EmitMethodBuilder.emitWithBuilder(EmitClassBuilder.scala:1005)
        at is.hail.expr.ir.WrappedEmitMethodBuilder.emitWithBuilder(EmitClassBuilder.scala:1058)
        at is.hail.expr.ir.WrappedEmitMethodBuilder.emitWithBuilder$(EmitClassBuilder.scala:1058)
        at is.hail.expr.ir.EmitFunctionBuilder.emitWithBuilder(EmitClassBuilder.scala:1074)
        at is.hail.expr.ir.Emit$.apply(Emit.scala:75)
        at is.hail.expr.ir.Compile$.apply(Compile.scala:78)
        at is.hail.expr.ir.CompileAndEvaluate$.$anonfun$_apply$4(CompileAndEvaluate.scala:61)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:61)
        at is.hail.expr.ir.CompileAndEvaluate$.$anonfun$apply$1(CompileAndEvaluate.scala:19)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.CompileAndEvaluate$.apply(CompileAndEvaluate.scala:19)
        at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:1093)
        at is.hail.expr.ir.lowering.LowerTableIR$.lower$2(LowerTableIR.scala:731)
        at is.hail.expr.ir.lowering.LowerTableIR$.applyTable(LowerTableIR.scala:1216)
        at is.hail.expr.ir.lowering.LowerTableIR$.lower$1(LowerTableIR.scala:493)
        at is.hail.expr.ir.lowering.LowerTableIR$.apply(LowerTableIR.scala:572)
        at is.hail.expr.ir.lowering.LowerToCDA$.lower(LowerToCDA.scala:73)
        at is.hail.expr.ir.lowering.LowerToCDA$.apply(LowerToCDA.scala:18)
        at is.hail.expr.ir.lowering.LowerToDistributedArrayPass.transform(LoweringPass.scala:77)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.evaluate$1(LowerOrInterpretNonCompilable.scala:27)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:67)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.rewrite$1(LowerOrInterpretNonCompilable.scala:53)
        at is.hail.expr.ir.LowerOrInterpretNonCompilable$.apply(LowerOrInterpretNonCompilable.scala:72)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.transform(LoweringPass.scala:69)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$3(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.$anonfun$apply$1(LoweringPass.scala:16)
        at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)
        at is.hail.expr.ir.lowering.LoweringPass.apply(LoweringPass.scala:14)
        at is.hail.expr.ir.lowering.LoweringPass.apply$(LoweringPass.scala:13)
        at is.hail.expr.ir.lowering.LowerOrInterpretNonCompilablePass$.apply(LoweringPass.scala:64)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1(LoweringPipeline.scala:15)
        at is.hail.expr.ir.lowering.LoweringPipeline.$anonfun$apply$1$adapted(LoweringPipeline.scala:13)
        at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
        at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
        at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:13)
        at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:47)
        at is.hail.backend.spark.SparkBackend._execute(SparkBackend.scala:450)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$2(SparkBackend.scala:486)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$3(ExecuteContext.scala:70)
        at is.hail.utils.package$.using(package.scala:635)
        at is.hail.backend.ExecuteContext$.$anonfun$scoped$2(ExecuteContext.scala:70)
        at is.hail.utils.package$.using(package.scala:635)
        at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:17)
        at is.hail.backend.ExecuteContext$.scoped(ExecuteContext.scala:59)
        at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:339)
        at is.hail.backend.spark.SparkBackend.$anonfun$executeEncode$1(SparkBackend.scala:483)
        at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)
        at is.hail.backend.spark.SparkBackend.executeEncode(SparkBackend.scala:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)



Hail version: 0.2.105-acd89e80c345
Error summary: AssertionError: assertion failed: ptype mismatch:
  upcast: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[1:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}
  computed: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[0:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}

This is when I try to run second code block at https://pan-dev.ukbb.broadinstitute.org/docs/hail-format/index.html#columns-phenotypes

Trying to keep this alive: should

>>> import hail.linalg as hli
>>> import ukbb_pan_ancestry as upa
>>> xx = hli.BlockMatrix.read(upa.get_ld_matrix_path('AFR'))
>>> xx.shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

produce a 403 error?

This one is a Hail bug:

Java stack trace:
java.lang.AssertionError: assertion failed: ptype mismatch:
  upcast: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[1:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}
  computed: +PCStruct{trait_type:PCString,phenocode:PCString,pheno_sex:PCString,coding:PCString,modifier:PCString,pheno_data:PCArray[PCStruct{}],description:PCString,description_more:PCString,coding_description:PCString,category:PCString,n_cases_full_cohort_both_sexes:PInt64,n_cases_full_cohort_females:PInt64,n_cases_full_cohort_males:PInt64,col_array:PCTuple[0:PCStruct{n_cases:+PInt32,n_controls:PInt32,heritability:PFloat64,saige_version:PCString,inv_normalized:PBoolean,pop:PCString,lambda_gc:PFloat64,n_variants:PInt64,n_sig_variants:PInt64}]}

I’ll point the team at it.

As for the permissions issues, @konradjk , do you intend the paths returned by these methods to be public datasets?

@VinceCarey , this should work:

hl.linalg.BlockMatrix.read(
    's3://pan-ukb-us-east-1/ld_release/UKBB.AFR.ldadj.bm'
)

Assuming that you have the s3 connector configured already (either be in EMR or install it on your laptop).

We have a lead on the bug. Expect updates here.

@VinceCarey

This will be fixed in the next release of Hail: 0.2.108. The PR that fixed it was [compiler] Fix subsetTo ptype method by tpoterba · Pull Request #12584 · hail-is/hail · GitHub.

I am having trouble understanding why all this

>>> from ukbb_pan_ancestry import *
>>> import hail as hl
>>> hl.init(spark_conf={'spark.hadoop.fs.gs.requester.pays.mode': 'AUTO',
...                     'spark.hadoop.fs.gs.requester.pays.project.id': 'landmarkanvil2'})

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Running on Apache Spark version 3.1.3
SparkUI available at http://saturn-989f94f8-a72d-4f37-99fe-832f619f4650-m.c.terra-91608afb.internal:44267
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.108-fc03e9d5dc08
LOGGING: writing to /home/jupyter/hail-20230121-1329-0.2.108-fc03e9d5dc08.log
>>>
>>> mt = load_final_sumstats_mt()
>>> mt.describe()
---------------------------------

works, but the following does not, related to permissions on metadata:

>>> from hail.linalg import BlockMatrix
>>> bm = BlockMatrix.read(get_ld_matrix_path(pop='AFR'))
>>> bm
<hail.linalg.blockmatrix.BlockMatrix object at 0x7f701b53ac70>
>>> bm.shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/linalg/blockmatrix.py", line 578, in shape
    return tensor_shape_to_matrix_shape(self._bmir)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/ir/blockmatrix_ir.py", line 427, in tensor_shape_to_matrix_shape
    shape = bmir.typ.shape
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/ir/base_ir.py", line 532, in typ
    self.compute_type(deep_typecheck=False)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/ir/base_ir.py", line 523, in compute_type
    computed = self._compute_type(deep_typecheck)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/ir/blockmatrix_ir.py", line 27, in _compute_type
    return Env.backend().blockmatrix_type(self)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 307, in blockmatrix_type
    jir = self._to_java_blockmatrix_ir(bmir)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 289, in _to_java_blockmatrix_ir
    return self._to_java_ir(ir, self._parse_blockmatrix_ir)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 276, in _to_java_ir
    ir._jir = parse(r(finalize_randomness(ir)), ir_map=r.jirs)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/backend/spark_backend.py", line 257, in _parse_blockmatrix_ir
    return self._jbackend.parse_blockmatrix_ir(code, ir_map)
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "/home/jupyter/.cache/R/basilisk/1.10.2/BiocHail/0.99.6/bsklenv/lib/python3.8/site-packages/hail/backend/py4j_backend.py", line 31, in deco
    raise fatal_error_from_java_error_triplet(deepest, full, error_id) from None
hail.utils.java.FatalError: GoogleJsonResponseException: 403 Forbidden
GET https://storage.googleapis.com/storage/v1/b/ukb-diverse-pops/o/ld%2FAFR%2FUKBB.AFR.ldadj.bm%2Fmetadata.json?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata
{
  "code" : 403,
  "errors" : [ {
    "domain" : "global",
    "message" : "pet-101835398722999344661@terra-91608afb.iam.gserviceaccount.com does not have storage.objects.get accessto the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).",
    "reason" : "forbidden"
  } ],
  "message" : "pet-101835398722999344661@terra-91608afb.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist)."
}

Java stack trace:
java.io.IOException: Error accessing gs://ukb-diverse-pops/ld/AFR/UKBB.AFR.ldadj.bm/metadata.json

This is really for @konradjk ; thanks @danking for dealing with the hail bug.

Yep, this is pointing to a private bucket. If you do get_ld_matrix_path(pop='AFR').replace('gs://ukb-diverse-pops', 'gs://ukb-diverse-pops-public') it will work for now. Note that this is a requester pays bucket and these files are large, so I strongly recommend doing all compute in us-central1. We will get the function fixed, and are working on a better hosting solution (it is also hosted on AWS if you need to download it for free).