Page 1 of 1

sort on required fields and reroute others

Posted: Mon Apr 27, 2009 9:06 am
by reachmexyz
Hello All

I have data which is like

col1, col2, col3, Amt

1,2,3,123
1,3,4,234,
1,2,3,12
2,4,5,234
5,6,7,123
1,2,3,13

In my job i will group on col1, col2, col3 and sum on Amt.

I will feed this data to a sort stage and the output from sort will be like

1,2,3,12
1,2,3,13
1,2,3,123
1,3,4,234
2,4,5,234
5,6,7,123

At this stage i dont want to group on all the records. I wish to find those duplicate records (on key col1, col2,col3)
(
1,2,3,12
1,2,3,13
1,2,3,123
)
and send this data to aggregator stage for grouping.
I wish to reroute other records where only one record exists for keycol1,col2,col3
(
1,3,4,234
2,4,5,234
5,6,7,123
) to a data set. Can i do this.
Because out of millions of records only 1000s of records exist with mulitple key matching (key col1, col2, col3). i wish to group only on these and reroute the rest of the records.
I want to save time by grouping on only required records.
Is this possible?

Responses are appreciated.
Thanks in Advance

Re: sort on required fields and reroute others

Posted: Mon Apr 27, 2009 4:25 pm
by ray.wurlod
reachmexyz wrote:Is this possible?
Yes.

Posted: Tue Apr 28, 2009 3:53 am
by bart12872
sort by col1,col2,col3
agregate by col1,col2,col3 with count rows
filter : 2 output links count>1 and count=1 to separate data

Posted: Tue Apr 28, 2009 3:57 am
by bart12872
I forgot
First, duplicate the data with copy and make a inner join between the filter output and the duplicate flow





---> copy --->Sort --->Agregate ---->Filter------|
..........|--------------------------------------------- Join------>

something like that. Adaptate as you want.