Import json annotations

Hi,

I’ve noticed some posts regarding importing json files as hail tables (List of Various Beginner Questions, https://hail.is/docs/0.2/change_log.html#id14) and was wondering if there was any documentation on how to do this.

As an example, I have vep annotations in json format from an external source and would like to add these annotations to a matrix table.

Thanks,
-Jonathan

This is currently undocumented, but I’ll add documentation for it for the next release. Thanks for bringing that back to our attention.

Basically, we support having any particular entry in a TSV file be a JSON object, so long as you specify that in the types argument. So if you have a file like:

id	json_field
8	{"foo": "bar", "x": 7}
4	{"foo": "b3", "x": 100}

You could import it by doing something like

types = {"id": hl.tint32, "json_field":hl.tstruct(foo=hl.tstr, x=hl.tint32)}
ht = hl.import_table('json_file.tsv', types=types)

I’m not aware of us having any automatic schema inference for JSON though, so if you’re dealing with a giant VEP schema, it may be a little unwieldly.

Hi @johnc1231,

Thanks for the info. I was able to get the import to work on a mininal example. Setting up the types string will be a bear for the full annotation, but it is good to know this is possible.

For my use case, the json file doesn’t have an ID or json_field row and starts directly with the data.

{"foo": "bar", "x": 7, "input", "some long string"}
{"foo": "b3", "x": 100, "input", "some other long string}

To import the json file:

types = {'f0':hl.tstruct(foo=hl.tstr, x = hl.tint32, input = hl.tstr)}
ht = hl.import_table('json_file.tsv', types = types, header = False)

# Import sets f0 as a top level field. Make the contents of f0 top level instead.
ht2 = ht.f0

# Drop some columns I don't need
ht3 = ht.drop('input')

Some questions:

  • Is there another way to set the columns as “top-level”?
  • Is this the best way to drop unwanted columns?

Thanks

The way I’d normally get columns to be “top level” is Table.flatten https://hail.is/docs/0.2/hail.Table.html?highlight=flatten#hail.Table.flatten

Yes, that’s a fine way to drop unwanted fields. The alternative thing is to use select, which lets you specify the columns you want to keep. https://hail.is/docs/0.2/hail.Table.html#hail.Table.select