What should be the format of the Nirvana annotation structure?

In the docs on the VEP structure schema it states:

vep_json_schema (string): The type of the VEP JSON schema (as produced by the VEP when invoked with the –json option). Note: This is the old-style ‘parseable’ Hail type syntax. This will change.

And it looks like the schema is something like this:

Struct{assembly_name:String,allele_string:String,ancestral:String,colocated_variants:Array[Struct{aa_allele:String,aa_maf:Float64....

When I look at the code for Nirvana, it seems to be using a DIFFERENT format and that looks like this

  val nirvanaSignature = TStruct(
    "chromosome" -> TString,
    "refAllele" -> TString,
    "position" -> TInt32,
    "altAlleles" -> TArray(TString),
    "cytogeneticBand" -> TString
    ...

The code that is currently in “Nirvana.scala” is indeed outdated, with respect to this json schema, but it does work (once you stream nirvana output) so my questions is, which format can (should?) I use for the definition of the Nirvana signature?

I already have one of those in the “parseable” format, so if I have to convert it into the one that is currently being used in Nirvana.scala, this will be a lot of work, unless there is an easy converter?

Thanks!

Thon

Parsable format should be fine. You can do IRParser.parseStructType(string) to parse that VEP-like schema string to a Scala TStruct type.

1 Like