Genotype matrix

I have 4 vcf files for 4 patients. I need to prepare this files to ( genotype matrix ) to do machine learning algorithms later!
So , I want to do a matrix
( the rows related to patients , and the columns related to snp )

how can I do it?

Hail does not support this operation (transposition) because usually VCFs are very very large and transposition is very very expensive.

If your data is small enough to fit on one machine, I recommend using numpy instead:

variant_by_sample = np.array(mt.GT.n_alt_alleles().collect()).reshape(mt.count())
sample_by_variant = variant_by_sample.t

Hi danking.
Thank you for your helping.
but in the second line what is t ?

sample_by_variant = variant_by_sample.t

this error that occur:

AttributeError: ‘numpy.ndarray’ object has no attribute ‘t’

So, how can I solve this error?
Thanks again.

I meant you need to transpose the NumPy matrix, apparently that’s capital T in NumPy. You should read the NumPy Quickstart.