How to remove Duplicates from source File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
manab86
Participant
Posts: 1
Joined: Mon Dec 15, 2008 6:12 am

How to remove Duplicates from source File

Post by manab86 »

I want to detect if there is any duplicates in the source file. If it's there then the job will not loading data into the table and will write the duplicates in a flat file and mail the file to the particular .

Advice me in both parallel and server job..

Can anyone pls help with the answer...it's urgent ???
vasa_dxx
Participant
Posts: 39
Joined: Sun Sep 28, 2008 2:59 am
Contact:

Post by vasa_dxx »

Please be clear with the requirements. Whether the the job should write whether only duplicates or all records if encountered with duplicates.
In PX , the Job can be designed like, the one below,
1.if you want to filter out only duplicate records.
seq_file_stage ---->>agg(no. of rec with a key)---->>filter ---->> targets

2. If you want all the records to a flat file
seq_file_stage ---->>agg(no. of rec with a key)---->>sort(desc)--->>filter--->>targets
Two wrongs don't make a right. But three lefts do.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

We don't do "urgent" here. DSXchange is an all-volunteer site whose members post as and when they can, nor are they obliged to do so. If you want urgent help sign up with your official support provider for premium service and learn just how expensive "urgent" can be.

You don't need DataStage to DETECT duplicates in a file. Perform a unique sort on the file and compare the result with the original - if they are the same size then there were no duplicates.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply