How to remove duplicates in Datastage server edition 8

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sujaoschin
Premium Member
Premium Member
Posts: 102
Joined: Tue Jan 31, 2006 4:13 am

How to remove duplicates in Datastage server edition 8

Post by sujaoschin »

How to remove the duplicate records in Datastage server edition jobs? Which stage I need to use? Is there any stage like 'remove duplicates' like in parallel extender?
Sujatha K
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Write them to a hashed file using the relevant keys. Last one in wins. The Hashed File stage implements destructive overwrite of keys.

Or use a Transformer stage with stage variables to detect change (or lack of) in sorted input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
arshi
Participant
Posts: 50
Joined: Wed Apr 18, 2007 5:12 am

Post by arshi »

Hi,
In Serveredition 8 no special stage for remove duplicates.
To remove the duplicates on column by using the stage variables by,
SvCurr=SvPre
link1.col1=SvCurr
If SvPre=SvCurr then @FALST else @TRUE .

Put this result in one more stage variable.
and Place this stage variable in the constraint.

Before this you need to sort the data bases on the column by using Sort stage.

Regards,
Arshi
david_sol_llaven
Participant
Posts: 5
Joined: Thu May 22, 2008 11:50 am

Re: How to remove duplicates in Datastage server edition 8

Post by david_sol_llaven »

sujaoschin wrote:How to remove the duplicate records in Datastage server edition jobs? Which stage I need to use? Is there any stage like 'remove duplicates' like in parallel extender?
As previously written, write to a Hash File. Or you can use an Aggregator Stage.

HTH
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Yu can use RANK /ROW_NUMBER functions if you are extracting from Oracle database, refer Oracle SQL Guide- freely downloadable from OTN. Similar database functions may exist for others.
sujaoschin
Premium Member
Premium Member
Posts: 102
Joined: Tue Jan 31, 2006 4:13 am

Post by sujaoschin »

I have used the Hash file and removed the duplicates.
Thank you very much
Sujatha K
Post Reply