Page 1 of 1

how to separate the duplicate and rest in other file

Posted: Wed Feb 02, 2011 12:27 am
by hemanth12
How can I separate the duplicate and rest of the rows in other file. This is only using of transformer stage don't use any partitions and sort or remove duplicate stage..

Thanks in advance.

Posted: Wed Feb 02, 2011 12:45 am
by veerabusani185512
If your source is Sequential file then..In properties tab-->Filter option...try to use sort -u #FileNamePath#....Which will select only unique records from sequential file stage

Posted: Wed Feb 02, 2011 3:30 am
by ray.wurlod
I don't answer interview questions. The correct answer DOES includes the stage types you insist on excluding.

Posted: Wed Feb 02, 2011 9:27 am
by chulett
Argh... why in the heck was this set up as a poll? Please don't do that as that's not something I can undo. :?

Posted: Wed Feb 02, 2011 9:51 am
by asorrell
I, however can! Poll deleted...

Posted: Wed Feb 02, 2011 10:04 am
by asorrell
The correct answer is:

1) Sort the data being sent to the transformer on the input link (no separate sort stage required) by all the required keys. Hash Partition on at least the major key to insure if there are duplicates, they will end up in the same partition. If you aren't allowed to partition (STUPID REQUIREMENT) then set the stage to operate in sequential mode instead of parallel mode.
2) Setup an integer stage variable called "svIsDuplicate" and initialize it to 0 (False).
3) Setup stage variables to hold each of your keys initialized to "".
4) Stage variables are processed in order from top to bottom. So first determine if the incoming row's keys all match the keys from the previous record that you are currently storing in the stage variables. If they all match the row is a duplicate so set svIsDuplicate to 1, else set it to 0.
5) Then reset all the saved variables holding your keys to hold the keys for the record you just read.
6) Have two separate identical output links, and add constraints to your output links so that one link gets output when svIsDuplicate is 0 (no duplicate) and another link gets output when svIsDuplicate is 1 (yes - duplicate).

Posted: Wed Feb 02, 2011 6:03 pm
by stuartjvnorton
asorrell wrote:I, however can! Poll deleted...
You're not supposed to delete knucklehead polls: you're supposed to add amusing options to give this otherwise useless interview question thread a little value. :lol:

Posted: Wed Feb 02, 2011 7:12 pm
by ray.wurlod
Agreed
:lol:
:twisted: