How to avoid duplicates in merge stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sivap
Participant
Posts: 14
Joined: Tue Oct 26, 2004 8:13 am

How to avoid duplicates in merge stage

Post by sivap »

Hi
I am merging two files using merge stage location and position as keys, in the out put i am getting duplicates records. how to eliminate duplicates and in the out put i have to get S000151 , Tom and S000152 , Kris
with my present logic i am getting S000152,Tom and S000152,Kris
any one knows logic pls help me.
Thanks in Advance.
File 1:
Name , CRMID , Position ,Location

Tom , VI, CSR, 923

Kris, SP, CSR, 923

Dana, DW, TSR, 882


File2 :

Position ID, Position, Location

S000151, CSR, 923

S000152, CSR, 923

S000153, TSR, 882



Out put File
Position ID, Name, CRMID

S000151, Tom, VI

S000152, Kris, SP

S000153, Dana, DW
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That is a really messy requirement, whether in SQL or DataStage. If you can think of a way to do it in SQL, we would have less trouble finding a DataStage solution.

I'm confident there is no solution available using the Merge stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sun rays
Charter Member
Charter Member
Posts: 57
Joined: Wed Jun 08, 2005 3:35 pm
Location: Denver, CO

Re: How to avoid duplicates in merge stage

Post by sun rays »

I guess the first thing to be done to use a merge stage is to remove duplicates, in your case the location and position are keys, and you have duplicate values. So you cannot use merge stage for this.
sivap
Participant
Posts: 14
Joined: Tue Oct 26, 2004 8:13 am

Re: How to avoid duplicates in merge stage

Post by sivap »

sun rays wrote:I guess the first thing to be done to use a merge stage is to remove duplicates, in your case the location and position are keys, and you have duplicate values. So you cannot use merge stage for this.
I am using merge stage for joining two files using innerjoin, not for removing duplicates. I am getting duplicates in the output of merge stage , I want to avoid those duplicates.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: How to avoid duplicates in merge stage

Post by PhilHibbs »

sivap wrote:Hi
I am merging two files using merge stage...
That's a bad start. Don't use the Merge stage. Load one of them into a hash, and use it as a lookup. Loading into the hash will remove duplicates, but if you care which duplicate key you use, sort the data so that the desired record comes last in a group of the same key and hence is loaded last into the hash overwriting previous values.
Phil Hibbs | Capgemini
Technical Consultant
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

One of the following two statements must be valid.

1.) The data is incorrect - as you have duplicates in one and you expect unique values in the output
2.) Your assumption of uniqueness is incorrect.

Find which is one is true and design accordingly.
sivap
Participant
Posts: 14
Joined: Tue Oct 26, 2004 8:13 am

Post by sivap »

ray.wurlod wrote:That is a really messy requirement, whether in SQL or DataStage. If you can think of a way to do it in SQL, we would have less trouble finding a DataStage solution.

I'm confident there is no solution available using the Merge stage.
I got the solution using stored pocedure.
Thank you .
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Care to share your solution, in case someone else has the same requirement?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply