sort on required fields and reroute others
Posted: Mon Apr 27, 2009 9:06 am
Hello All
I have data which is like
col1, col2, col3, Amt
1,2,3,123
1,3,4,234,
1,2,3,12
2,4,5,234
5,6,7,123
1,2,3,13
In my job i will group on col1, col2, col3 and sum on Amt.
I will feed this data to a sort stage and the output from sort will be like
1,2,3,12
1,2,3,13
1,2,3,123
1,3,4,234
2,4,5,234
5,6,7,123
At this stage i dont want to group on all the records. I wish to find those duplicate records (on key col1, col2,col3)
(
1,2,3,12
1,2,3,13
1,2,3,123
)
and send this data to aggregator stage for grouping.
I wish to reroute other records where only one record exists for keycol1,col2,col3
(
1,3,4,234
2,4,5,234
5,6,7,123
) to a data set. Can i do this.
Because out of millions of records only 1000s of records exist with mulitple key matching (key col1, col2, col3). i wish to group only on these and reroute the rest of the records.
I want to save time by grouping on only required records.
Is this possible?
Responses are appreciated.
Thanks in Advance
I have data which is like
col1, col2, col3, Amt
1,2,3,123
1,3,4,234,
1,2,3,12
2,4,5,234
5,6,7,123
1,2,3,13
In my job i will group on col1, col2, col3 and sum on Amt.
I will feed this data to a sort stage and the output from sort will be like
1,2,3,12
1,2,3,13
1,2,3,123
1,3,4,234
2,4,5,234
5,6,7,123
At this stage i dont want to group on all the records. I wish to find those duplicate records (on key col1, col2,col3)
(
1,2,3,12
1,2,3,13
1,2,3,123
)
and send this data to aggregator stage for grouping.
I wish to reroute other records where only one record exists for keycol1,col2,col3
(
1,3,4,234
2,4,5,234
5,6,7,123
) to a data set. Can i do this.
Because out of millions of records only 1000s of records exist with mulitple key matching (key col1, col2, col3). i wish to group only on these and reroute the rest of the records.
I want to save time by grouping on only required records.
Is this possible?
Responses are appreciated.
Thanks in Advance