Using VCF non GT column

Hi, I need to read genotypes from a column other than GT in a vcf file. Does any one know how to do it in Hail? Many thanks for you help!

Hi Tetyana! Sorry for the silence on your other post.

If I understand correctly, I think all you should need to do is

hl.import_vcf(path_to_vcf, call_fields=[genotype_field])

It would even work without the call_fields argument, but it wouldn’t know to import that field as call-typed data, and would just represent the calls as structs (I think).

Hopefully that works for you. If not, please share here what happened.

Hi Patrick, thank you for your response and apologies for delayed follow up. I had issues with installing Hail on prem, but I think I’ve figured out the problem. So, I’ve tried the command you’ve suggested and got the following error: NameError: name ‘GTA’ is not defined
GTA is the field from which I’d like to read in the genotypes, so the command looked like this:
data=hl.import_vcf(path_to_vcf, call_fields=[GTA]). The vcf file was not compressed (can it need to be compressed?). The filed GTA was read with no problems by vcftools.

Ah, the field name needs to be a string. Try

data=hl.import_vcf(path_to_vcf, call_fields=['GTA'])

Thank you for the quick response! I was able to read in the vcf with the filed name being a string GTA. But I was not able to do anything with it afterwards. My goal is to read in a filed that is not GT and then export the data in either PLINK or vcf format, so that the non GT filed will become GT (in other words, when PLINK or other program reads the exported file, the data that was in GTA filed will be read in as GT filed). When I tried to export what was read in using the following command: hl.export_vcf(data,‘test.vcf’), I got the following error Error summary: NoClassDefFoundError: Could not initialize class com.github.luben.zstd.ZstdCompressCtx. I’m not sure if this is because Hail dependencies are not right or if there is something else going on. I’m happy to share the log file from the run.

Not sure what’s going on there. Can you share the log file, as well as the full stack trace that printed with the error? Also, where was this run (e.g. local mac/windows machine, cloud)?

I’m allowed to put only one embedded media per post, here is one screenshot of load the hail (there were some warnings), then of everything that was printed on the screen after the export_vcf command is coming…Everything was run on BROAD server. Prior to running Hail, Anaconda3 and Java 11 were loaded. Then, I used ipython to run Hail.

hail-20240218-2027-0.2.127-bb535cd096c5.log (87.3 KB)