Hi,
I read in a csv file that has the following structure:
In [3]: t.describe()
----------------------------------------
Global fields:
None
----------------------------------------
Row fields:
'#CHROM': str
'POSITION': str
'ID': str
'DNA_MUT': str
'PROT_MUT': str
'SNV': str
'INDELS': str
'ID_REF': str
'ID_ALT': str
'STRAND': str
----------------------------------------
Key: []
----------------------------------------
I want to take ‘#CHROM’ and ‘POSITION’, generate an id from them (by concatenation, e.g. CHROM + ‘_’ + POSITION) and group by them merging the remaining values into arrays for unique concatenated #CHROM + POSITION. How to achieve that? The final result that I am looking for is the following:
ID data
CHROM + '_' + POSITION [[...], [...], ...]