I have used the sorter to sort the records based on col1 and col2 and in the next stage, transformer, i compare the previous records value and current records value and tag the col3 value for a duplicate record (a record is duplicate based on the values of col1 and col2 ) and remove the duplicate and pass the last record for a group.source:
col1 col2 col3
10 aaaa 123
10 aaaa 345
10 wqert 126
10 aaaa 789
output:
col1 col2 col3
10 aaaa 123,345,789
10 wqert 126
now, the problem is - this job gets million of records from input and when sorting/removing duplicates, the job gets failed due to no enough space in server. so i wanted to know is there any way i can acheive this functionality , through a readymade stage in DataStage? or any alternative solution is welcome!
pls let me know this.
Thanks in advance!