Arwa
November 24, 2020, 5:18pm
1
Hello.
I have 4 vcf files for 4 patients. I need to prepare this files to ( genotype matrix ) to do machine learning algorithms later!
So , I want to do a matrix
( the rows related to patients , and the columns related to snp )
how can I do it?
Thanks
danking
November 24, 2020, 7:50pm
2
Hail does not support this operation (transposition) because usually VCFs are very very large and transposition is very very expensive.
If your data is small enough to fit on one machine, I recommend using numpy instead:
variant_by_sample = np.array(mt.GT.n_alt_alleles().collect()).reshape(mt.count())
sample_by_variant = variant_by_sample.t
Arwa
November 25, 2020, 9:06pm
3
Hi danking.
Thank you for your helping.
but in the second line what is t ?
sample_by_variant = variant_by_sample.t
this error that occur:
AttributeError: ‘numpy.ndarray’ object has no attribute ‘t’
So, how can I solve this error?
Thanks again.
danking
November 25, 2020, 10:35pm
4
I meant you need to transpose the NumPy matrix, apparently that’s capital T
in NumPy. You should read the NumPy Quickstart .