Import multiple plink files


#1

Is there a way to import plink files that are split into separate chromosomes? I see that gen and bgen files have wildcard matching to import data, is there a similar feature for plink and/or vcf files?

Thanks,
-Jonathan


#2

no way to do this at the moment, I think. We can add a vds union function, which should solve this problem.


#3

A union of VariantDataSets would be a useful feature for concatenating chromosomes. Is there any work being done of this yet?

John


#4

Definitely in the plans for 0.2! If we do it sooner, maybe we can even backport it to 0.1 as well.


#5

It would be great to have the possibility to make a union of multiple vds files which can have different samples, representing different cohorts, not only for concatenating chromosomes. Is that in the plans for 0.2 also?

Cheers,

Stephane


#6

We have VariantDataset.join, which joins datasets with different samples on common variants.


#7

That is what I was looking for, thanks. One more question though, what happens when the variants are not on the same strand, or when there is an allele discrepancy?


#8

We don’t currently have anything to correct strand flips or ref/alt swaps. The variant 1:100:A:T won’t join with 1:100:T:A.


#9

Is this functionality in the works for the next version?


#10

We’re expanding our reference functionality for 0.2 but don’t currently have any concrete plans to use fasta files (or similar) to actually do realignment.


#11

Even without using FASTA, a simple option that would attempt to flip A/C to T/G (for example) to check if there would be a match would be useful, possibly also another option instructing join to consider A/C and C/A as a simple difference in minor allele between cohorts/populations would also increase functionality. Just an idea :slight_smile:


#12

All good points. Some of this stuff is tough to do automatically (it may be the wrong thing for some datasets) but we should certainly have general functionality that extends join to do a unification step. This is a good motivating example!


#13

I agree this would not be suitable for all datasets, no question there. But a way to detect/flip strands, like in Plink, and disregard allele frequency when merging, as could be the case when merging populations in which the AF is different (multi-ethnic HapMap 3 panel for example), or variants with a MAF very close to 50%, would alleviate the need to go back to Plink to merge datasets.
I’m sure actually implementing it may not be as straighforward as it sounds, so maybe for v0.3 ? :wink: