How can I separate the duplicate and rest of the rows in other file. This is only using of transformer stage don't use any partitions and sort or remove duplicate stage..
Thanks in advance.
how to separate the duplicate and rest in other file
Moderators: chulett, rschirm, roy
how to separate the duplicate and rest in other file
DATASTAGE DEVELOPER
-
- Participant
- Posts: 11
- Joined: Fri Jan 30, 2009 3:21 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The correct answer is:
1) Sort the data being sent to the transformer on the input link (no separate sort stage required) by all the required keys. Hash Partition on at least the major key to insure if there are duplicates, they will end up in the same partition. If you aren't allowed to partition (STUPID REQUIREMENT) then set the stage to operate in sequential mode instead of parallel mode.
2) Setup an integer stage variable called "svIsDuplicate" and initialize it to 0 (False).
3) Setup stage variables to hold each of your keys initialized to "".
4) Stage variables are processed in order from top to bottom. So first determine if the incoming row's keys all match the keys from the previous record that you are currently storing in the stage variables. If they all match the row is a duplicate so set svIsDuplicate to 1, else set it to 0.
5) Then reset all the saved variables holding your keys to hold the keys for the record you just read.
6) Have two separate identical output links, and add constraints to your output links so that one link gets output when svIsDuplicate is 0 (no duplicate) and another link gets output when svIsDuplicate is 1 (yes - duplicate).
1) Sort the data being sent to the transformer on the input link (no separate sort stage required) by all the required keys. Hash Partition on at least the major key to insure if there are duplicates, they will end up in the same partition. If you aren't allowed to partition (STUPID REQUIREMENT) then set the stage to operate in sequential mode instead of parallel mode.
2) Setup an integer stage variable called "svIsDuplicate" and initialize it to 0 (False).
3) Setup stage variables to hold each of your keys initialized to "".
4) Stage variables are processed in order from top to bottom. So first determine if the incoming row's keys all match the keys from the previous record that you are currently storing in the stage variables. If they all match the row is a duplicate so set svIsDuplicate to 1, else set it to 0.
5) Then reset all the saved variables holding your keys to hold the keys for the record you just read.
6) Have two separate identical output links, and add constraints to your output links so that one link gets output when svIsDuplicate is 0 (no duplicate) and another link gets output when svIsDuplicate is 1 (yes - duplicate).
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: