How to avoid duplicates in merge stage

sivap · Post by **sivap** » Sun Sep 11, 2005 8:16 pm

Hi
I am merging two files using merge stage location and position as keys, in the out put i am getting duplicates records. how to eliminate duplicates and in the out put i have to get S000151 , Tom and S000152 , Kris
with my present logic i am getting S000152,Tom and S000152,Kris
any one knows logic pls help me.
Thanks in Advance.
File 1:
Name , CRMID , Position ,Location

Tom , VI, CSR, 923

Kris, SP, CSR, 923

Dana, DW, TSR, 882

File2 :

Position ID, Position, Location

S000151, CSR, 923

S000152, CSR, 923

S000153, TSR, 882

Out put File
Position ID, Name, CRMID

S000151, Tom, VI

S000152, Kris, SP

S000153, Dana, DW

ray.wurlod · Post by **ray.wurlod** » Sun Sep 11, 2005 9:25 pm

That is a really messy requirement, whether in SQL or DataStage. If you can think of a way to do it in SQL, we would have less trouble finding a DataStage solution.

I'm confident there is no solution available using the Merge stage.

sun rays · Post by **sun rays** » Mon Sep 12, 2005 5:14 pm

I guess the first thing to be done to use a merge stage is to remove duplicates, in your case the location and position are keys, and you have duplicate values. So you cannot use merge stage for this.

sivap · Post by **sivap** » Mon Sep 12, 2005 9:15 pm

sun rays wrote:I guess the first thing to be done to use a merge stage is to remove duplicates, in your case the location and position are keys, and you have duplicate values. So you cannot use merge stage for this.

I am using merge stage for joining two files using innerjoin, not for removing duplicates. I am getting duplicates in the output of merge stage , I want to avoid those duplicates.

PhilHibbs · Post by **PhilHibbs** » Tue Sep 13, 2005 3:01 am

sivap wrote:Hi
I am merging two files using merge stage...

That's a bad start. Don't use the Merge stage. Load one of them into a hash, and use it as a lookup. Loading into the hash will remove duplicates, but if you care which duplicate key you use, sort the data so that the desired record comes last in a group of the same key and hence is loaded last into the hash overwriting previous values.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Tue Sep 13, 2005 7:30 am

One of the following two statements must be valid.

1.) The data is incorrect - as you have duplicates in one and you expect unique values in the output
2.) Your assumption of uniqueness is incorrect.

Find which is one is true and design accordingly.

sivap · Post by **sivap** » Tue Sep 13, 2005 6:07 pm

ray.wurlod wrote:That is a really messy requirement, whether in SQL or DataStage. If you can think of a way to do it in SQL, we would have less trouble finding a DataStage solution.

I'm confident there is no solution available using the Merge stage.

I got the solution using stored pocedure.
Thank you .

ray.wurlod · Post by **ray.wurlod** » Tue Sep 13, 2005 7:59 pm

Care to share your solution, in case someone else has the same requirement?

DSXchange

How to avoid duplicates in merge stage

How to avoid duplicates in merge stage

Re: How to avoid duplicates in merge stage

Re: How to avoid duplicates in merge stage

Re: How to avoid duplicates in merge stage