Page 1 of 1

To reject all my duplicates

Posted: Mon Jan 22, 2007 7:10 am
by suresh.narasimha
Hi EveryBody,

I have a Sequential File ===> XFM1 ====>XFM2 .....

I have a reject file to the XFM1 and i'm picking up the first occurance for my duplicates, the rest are rejected.

Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.

How can i do that ?

Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400

Now my output should have

Col1 Col2
10 200

and my reject file should have

Col1 Col2
10 200
10 300
10 400

Thanks In Advance,
Suresh N

To reject all my duplicates

Posted: Mon Jan 22, 2007 7:22 am
by suresh.narasimha
Sorry Small Correction.

I have a Sequential File ===>AGG1 ====>XFM2 .....

I have a reject file to the AGG1 and i'm picking up the first occurance for my duplicates, the rest are rejected.

Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.

How can i do that ?

Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400

Now my output should have

Col1 Col2
10 200

and my reject file should have

Col1 Col2
10 200
10 300
10 400

Thanks ,
Suresh N

Posted: Mon Jan 22, 2007 7:32 am
by DSguru2B
The first part is easy, pass it through the aggregator, grouping on Col1 and provide 'First' as the derivation for Col2.
As for your second requirement, i have a followup question:
Do you want 10,200(in your sample data) to be in your reject file as well???

To reject all my duplicates

Posted: Mon Jan 22, 2007 10:41 pm
by suresh.narasimha
Yes Guru you are correct, I need 10,200 in sample data reject file.

Regards,
Suresh N

Posted: Mon Jan 22, 2007 11:17 pm
by narasimha
Would you call it a reject file if you want 10,200 also ?
In such a condition your source and reject file will have the same data all the time.
Did I miss something here?

To reject all my duplicates

Posted: Tue Jan 23, 2007 1:42 am
by suresh.narasimha
Hi Narasimha,

You are correct infact. But this is our requirement we need to do.

Please give me some start up idea.

Regards,
Suresh N

Posted: Tue Jan 23, 2007 2:15 am
by elavenil
Suresh,

If that is the req, as Narasimha highlighted there is no difference between the source & target. The first row can be sent to the output from the aggregator and source file itself can be shown as reject records.

Regards
Elavenil

Posted: Tue Jan 23, 2007 2:23 am
by ray.wurlod
I would use a Transformer stage to identify and remove duplicates from one output, and direct all input rows to another output (the "rejects"). This approach requires sorted input.

Posted: Tue Jan 23, 2007 3:00 pm
by DSguru2B
Do this.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:

Code: Select all

condFlag  | if (prevVal <> src.key) then 'X' else 'Y'
prevVal   | src.key
Have two links coming out of the transformer. Say Trg and buildHash. Trg will be going to your flat file or database. buildHash will go to a hashed file keyed on your first column (key).
Constraint for Trg: condFlag = 'X'
Constraint for buildHash: condFlag = 'Y'


In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as NOT(reflink.NOTFOUND) where reflink is your reference link name. The output of this second job will give you your reject file which will have all the records which are duplicates based on key.