How to remove Duplicates from source File

manab86 · Post by **manab86** » Tue Dec 16, 2008 7:17 am

I want to detect if there is any duplicates in the source file. If it's there then the job will not loading data into the table and will write the duplicates in a flat file and mail the file to the particular .

Advice me in both parallel and server job..

Can anyone pls help with the answer...it's urgent ???

vasa_dxx · Post by **vasa_dxx** » Tue Dec 16, 2008 8:27 am

Please be clear with the requirements. Whether the the job should write whether only duplicates or all records if encountered with duplicates.
In PX , the Job can be designed like, the one below,
1.if you want to filter out only duplicate records.
seq_file_stage ---->>agg(no. of rec with a key)---->>filter ---->> targets

2. If you want all the records to a flat file
seq_file_stage ---->>agg(no. of rec with a key)---->>sort(desc)--->>filter--->>targets

ray.wurlod · Post by **ray.wurlod** » Tue Dec 16, 2008 12:21 pm

We don't do "urgent" here. DSXchange is an all-volunteer site whose members post as and when they can, nor are they obliged to do so. If you want urgent help sign up with your official support provider for premium service and learn just how expensive "urgent" can be.

You don't need DataStage to DETECT duplicates in a file. Perform a unique sort on the file and compare the result with the original - if they are the same size then there were no duplicates.