Hi I have a CSV file. I need to find if there are any duplicates in the files. If any duplicate records are present I need to reject them and put into a sequential file.
Can any body let me know hoe can i do this in DataStage8.1 version?
Thanks, Raj.
How to find duplicates in a sequential file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 19
- Joined: Fri May 14, 2010 3:54 pm
How to find duplicates in a sequential file
Rajesekhar Potteti
You will have to do the following:
1) Sort your input on the key fields.
2) Use a transformer with stage variables to compare each record with the value of the previous record (on whatever key fields you need to check).
3) Setup constraints on the transformer's output links to restrict record flow so that only records where all key fields match the previous record are output to the reject (and only the reject) link.
1) Sort your input on the key fields.
2) Use a transformer with stage variables to compare each record with the value of the previous record (on whatever key fields you need to check).
3) Setup constraints on the transformer's output links to restrict record flow so that only records where all key fields match the previous record are output to the reject (and only the reject) link.
-
- Participant
- Posts: 19
- Joined: Fri May 14, 2010 3:54 pm
You would need to use StageVariables to do what Andy suggested.
Another way to do is to use a Sort stage and set Create Key Change Column Property to True and Allow duplicates property to True. Then after the Sort stage, put a filter stage and filter on the key change column to grab the duplicates and non duplicates ( key change column =0 and key change column =1) into the links you want.
Hope that helps.
Another way to do is to use a Sort stage and set Create Key Change Column Property to True and Allow duplicates property to True. Then after the Sort stage, put a filter stage and filter on the key change column to grab the duplicates and non duplicates ( key change column =0 and key change column =1) into the links you want.
Hope that helps.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Participant
- Posts: 63
- Joined: Mon Oct 20, 2008 12:01 am
- Location: Malaysia
-
- Participant
- Posts: 19
- Joined: Fri May 14, 2010 3:54 pm
Hi All,
Thanks for the suggestions you gave to resolve this.
I followed three approches to make sure all are giving same results. And All gave same results
1. Using Sort and Filter stage.
In sort stage I had set Create Key Change column to True.
In Filter stage using KeyChange attribute i routed unique and duplicate values to two diffrent seq files.
2. Using sort Transformer Stage
In sort stage I had set Create Key Change column to True.
In Transformer stage I gave constraints if KeyChange is 0 then to one out Link and if KeyChange is 1 then to another out link.
3. Using sort Transformer Stage
In Transformer stage using StageVaribales I comapred current and previous records accordingly I routed messages to two out links.
All three approaches worked very well.
Thanks all for your help!!
Rajesekhar P
Thanks for the suggestions you gave to resolve this.
I followed three approches to make sure all are giving same results. And All gave same results
1. Using Sort and Filter stage.
In sort stage I had set Create Key Change column to True.
In Filter stage using KeyChange attribute i routed unique and duplicate values to two diffrent seq files.
2. Using sort Transformer Stage
In sort stage I had set Create Key Change column to True.
In Transformer stage I gave constraints if KeyChange is 0 then to one out Link and if KeyChange is 1 then to another out link.
3. Using sort Transformer Stage
In Transformer stage using StageVaribales I comapred current and previous records accordingly I routed messages to two out links.
All three approaches worked very well.
Thanks all for your help!!
Rajesekhar P
Rajesekhar Potteti
-
- Participant
- Posts: 19
- Joined: Fri May 14, 2010 3:54 pm
Hi All,
Thanks for the suggestions you gave to resolve this.
I followed three approches to make sure all are giving same results. And All gave same results
1. Using Sort and Filter stage.
In sort stage I had set Create Key Change column to True.
In Filter stage using KeyChange attribute i routed unique and duplicate values to two diffrent seq files.
2. Using sort Transformer Stage
In sort stage I had set Create Key Change column to True.
In Transformer stage I gave constraints if KeyChange is 0 then to one out Link and if KeyChange is 1 then to another out link.
3. Using sort Transformer Stage
In Transformer stage using StageVaribales I comapred current and previous records accordingly I routed messages to two out links.
All three approaches worked very well.
Thanks all for your help!!
Rajesekhar P
Thanks for the suggestions you gave to resolve this.
I followed three approches to make sure all are giving same results. And All gave same results
1. Using Sort and Filter stage.
In sort stage I had set Create Key Change column to True.
In Filter stage using KeyChange attribute i routed unique and duplicate values to two diffrent seq files.
2. Using sort Transformer Stage
In sort stage I had set Create Key Change column to True.
In Transformer stage I gave constraints if KeyChange is 0 then to one out Link and if KeyChange is 1 then to another out link.
3. Using sort Transformer Stage
In Transformer stage using StageVaribales I comapred current and previous records accordingly I routed messages to two out links.
All three approaches worked very well.
Thanks all for your help!!
Rajesekhar P
Rajesekhar Potteti