Page 1 of 1

How to remove Duplicates from source File

Posted: Tue Dec 16, 2008 7:17 am
by manab86
I want to detect if there is any duplicates in the source file. If it's there then the job will not loading data into the table and will write the duplicates in a flat file and mail the file to the particular .

Advice me in both parallel and server job..

Can anyone pls help with the answer...it's urgent ???

Posted: Tue Dec 16, 2008 8:27 am
by vasa_dxx
Please be clear with the requirements. Whether the the job should write whether only duplicates or all records if encountered with duplicates.
In PX , the Job can be designed like, the one below,
1.if you want to filter out only duplicate records.
seq_file_stage ---->>agg(no. of rec with a key)---->>filter ---->> targets

2. If you want all the records to a flat file
seq_file_stage ---->>agg(no. of rec with a key)---->>sort(desc)--->>filter--->>targets

Posted: Tue Dec 16, 2008 12:21 pm
by ray.wurlod
We don't do "urgent" here. DSXchange is an all-volunteer site whose members post as and when they can, nor are they obliged to do so. If you want urgent help sign up with your official support provider for premium service and learn just how expensive "urgent" can be.

You don't need DataStage to DETECT duplicates in a file. Perform a unique sort on the file and compare the result with the original - if they are the same size then there were no duplicates.