To reject all my duplicates

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
suresh.narasimha
Premium Member
Premium Member
Posts: 81
Joined: Mon Nov 21, 2005 4:17 am
Location: Sydney, Australia
Contact:

To reject all my duplicates

Post by suresh.narasimha »

Hi EveryBody,

I have a Sequential File ===> XFM1 ====>XFM2 .....

I have a reject file to the XFM1 and i'm picking up the first occurance for my duplicates, the rest are rejected.

Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.

How can i do that ?

Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400

Now my output should have

Col1 Col2
10 200

and my reject file should have

Col1 Col2
10 200
10 300
10 400

Thanks In Advance,
Suresh N
SURESH NARASIMHA
suresh.narasimha
Premium Member
Premium Member
Posts: 81
Joined: Mon Nov 21, 2005 4:17 am
Location: Sydney, Australia
Contact:

To reject all my duplicates

Post by suresh.narasimha »

Sorry Small Correction.

I have a Sequential File ===>AGG1 ====>XFM2 .....

I have a reject file to the AGG1 and i'm picking up the first occurance for my duplicates, the rest are rejected.

Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.

How can i do that ?

Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400

Now my output should have

Col1 Col2
10 200

and my reject file should have

Col1 Col2
10 200
10 300
10 400

Thanks ,
Suresh N
SURESH NARASIMHA
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The first part is easy, pass it through the aggregator, grouping on Col1 and provide 'First' as the derivation for Col2.
As for your second requirement, i have a followup question:
Do you want 10,200(in your sample data) to be in your reject file as well???
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
suresh.narasimha
Premium Member
Premium Member
Posts: 81
Joined: Mon Nov 21, 2005 4:17 am
Location: Sydney, Australia
Contact:

To reject all my duplicates

Post by suresh.narasimha »

Yes Guru you are correct, I need 10,200 in sample data reject file.

Regards,
Suresh N
SURESH NARASIMHA
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Would you call it a reject file if you want 10,200 also ?
In such a condition your source and reject file will have the same data all the time.
Did I miss something here?
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
suresh.narasimha
Premium Member
Premium Member
Posts: 81
Joined: Mon Nov 21, 2005 4:17 am
Location: Sydney, Australia
Contact:

To reject all my duplicates

Post by suresh.narasimha »

Hi Narasimha,

You are correct infact. But this is our requirement we need to do.

Please give me some start up idea.

Regards,
Suresh N
SURESH NARASIMHA
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Suresh,

If that is the req, as Narasimha highlighted there is no difference between the source & target. The first row can be sent to the output from the aggregator and source file itself can be shown as reject records.

Regards
Elavenil
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I would use a Transformer stage to identify and remove duplicates from one output, and direct all input rows to another output (the "rejects"). This approach requires sorted input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Do this.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:

Code: Select all

condFlag  | if (prevVal <> src.key) then 'X' else 'Y'
prevVal   | src.key
Have two links coming out of the transformer. Say Trg and buildHash. Trg will be going to your flat file or database. buildHash will go to a hashed file keyed on your first column (key).
Constraint for Trg: condFlag = 'X'
Constraint for buildHash: condFlag = 'Y'


In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as NOT(reflink.NOTFOUND) where reflink is your reference link name. The output of this second job will give you your reject file which will have all the records which are duplicates based on key.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply