sort on required fields and reroute others

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

sort on required fields and reroute others

Post by reachmexyz »

Hello All

I have data which is like

col1, col2, col3, Amt

1,2,3,123
1,3,4,234,
1,2,3,12
2,4,5,234
5,6,7,123
1,2,3,13

In my job i will group on col1, col2, col3 and sum on Amt.

I will feed this data to a sort stage and the output from sort will be like

1,2,3,12
1,2,3,13
1,2,3,123
1,3,4,234
2,4,5,234
5,6,7,123

At this stage i dont want to group on all the records. I wish to find those duplicate records (on key col1, col2,col3)
(
1,2,3,12
1,2,3,13
1,2,3,123
)
and send this data to aggregator stage for grouping.
I wish to reroute other records where only one record exists for keycol1,col2,col3
(
1,3,4,234
2,4,5,234
5,6,7,123
) to a data set. Can i do this.
Because out of millions of records only 1000s of records exist with mulitple key matching (key col1, col2, col3). i wish to group only on these and reroute the rest of the records.
I want to save time by grouping on only required records.
Is this possible?

Responses are appreciated.
Thanks in Advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: sort on required fields and reroute others

Post by ray.wurlod »

reachmexyz wrote:Is this possible?
Yes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

Post by bart12872 »

sort by col1,col2,col3
agregate by col1,col2,col3 with count rows
filter : 2 output links count>1 and count=1 to separate data
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

Post by bart12872 »

I forgot
First, duplicate the data with copy and make a inner join between the filter output and the duplicate flow





---> copy --->Sort --->Agregate ---->Filter------|
..........|--------------------------------------------- Join------>

something like that. Adaptate as you want.
Post Reply