To reject all my duplicates
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 81
- Joined: Mon Nov 21, 2005 4:17 am
- Location: Sydney, Australia
- Contact:
To reject all my duplicates
Hi EveryBody,
I have a Sequential File ===> XFM1 ====>XFM2 .....
I have a reject file to the XFM1 and i'm picking up the first occurance for my duplicates, the rest are rejected.
Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.
How can i do that ?
Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400
Now my output should have
Col1 Col2
10 200
and my reject file should have
Col1 Col2
10 200
10 300
10 400
Thanks In Advance,
Suresh N
I have a Sequential File ===> XFM1 ====>XFM2 .....
I have a reject file to the XFM1 and i'm picking up the first occurance for my duplicates, the rest are rejected.
Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.
How can i do that ?
Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400
Now my output should have
Col1 Col2
10 200
and my reject file should have
Col1 Col2
10 200
10 300
10 400
Thanks In Advance,
Suresh N
SURESH NARASIMHA
-
- Premium Member
- Posts: 81
- Joined: Mon Nov 21, 2005 4:17 am
- Location: Sydney, Australia
- Contact:
To reject all my duplicates
Sorry Small Correction.
I have a Sequential File ===>AGG1 ====>XFM2 .....
I have a reject file to the AGG1 and i'm picking up the first occurance for my duplicates, the rest are rejected.
Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.
How can i do that ?
Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400
Now my output should have
Col1 Col2
10 200
and my reject file should have
Col1 Col2
10 200
10 300
10 400
Thanks ,
Suresh N
I have a Sequential File ===>AGG1 ====>XFM2 .....
I have a reject file to the AGG1 and i'm picking up the first occurance for my duplicates, the rest are rejected.
Now my requirement is that i want to pick my first occurance and wanted to reject all my duplicates.
How can i do that ?
Suppose i have my data like this
Col1 Col2
10 200
10 300
10 400
Now my output should have
Col1 Col2
10 200
and my reject file should have
Col1 Col2
10 200
10 300
10 400
Thanks ,
Suresh N
SURESH NARASIMHA
The first part is easy, pass it through the aggregator, grouping on Col1 and provide 'First' as the derivation for Col2.
As for your second requirement, i have a followup question:
Do you want 10,200(in your sample data) to be in your reject file as well???
As for your second requirement, i have a followup question:
Do you want 10,200(in your sample data) to be in your reject file as well???
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Premium Member
- Posts: 81
- Joined: Mon Nov 21, 2005 4:17 am
- Location: Sydney, Australia
- Contact:
To reject all my duplicates
Yes Guru you are correct, I need 10,200 in sample data reject file.
Regards,
Suresh N
Regards,
Suresh N
SURESH NARASIMHA
-
- Premium Member
- Posts: 81
- Joined: Mon Nov 21, 2005 4:17 am
- Location: Sydney, Australia
- Contact:
To reject all my duplicates
Hi Narasimha,
You are correct infact. But this is our requirement we need to do.
Please give me some start up idea.
Regards,
Suresh N
You are correct infact. But this is our requirement we need to do.
Please give me some start up idea.
Regards,
Suresh N
SURESH NARASIMHA
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I would use a Transformer stage to identify and remove duplicates from one output, and direct all input rows to another output (the "rejects"). This approach requires sorted input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Do this.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:
Have two links coming out of the transformer. Say Trg and buildHash. Trg will be going to your flat file or database. buildHash will go to a hashed file keyed on your first column (key).
Constraint for Trg: condFlag = 'X'
Constraint for buildHash: condFlag = 'Y'
In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as NOT(reflink.NOTFOUND) where reflink is your reference link name. The output of this second job will give you your reject file which will have all the records which are duplicates based on key.
Sort the incoming data on your key. Define two stage variables in the transformer, say condFlag and prevVal. The will basically detect duplicates and flag them. Their both will be initialized to 0. Their derivation will be as follows:
Code: Select all
condFlag | if (prevVal <> src.key) then 'X' else 'Y'
prevVal | src.key
Constraint for Trg: condFlag = 'X'
Constraint for buildHash: condFlag = 'Y'
In the same job or maybe a second job feed the same source file and do a lookup on this hashed file keyed on your first column (key). Provide the constraint as NOT(reflink.NOTFOUND) where reflink is your reference link name. The output of this second job will give you your reject file which will have all the records which are duplicates based on key.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.