how to separate the duplicate and rest in other file

hemanth12 · Post by **hemanth12** » Wed Feb 02, 2011 12:27 am

How can I separate the duplicate and rest of the rows in other file. This is only using of transformer stage don't use any partitions and sort or remove duplicate stage..

Thanks in advance.

veerabusani185512 · Post by **veerabusani185512** » Wed Feb 02, 2011 12:45 am

If your source is Sequential file then..In properties tab-->Filter option...try to use sort -u #FileNamePath#....Which will select only unique records from sequential file stage

ray.wurlod · Post by **ray.wurlod** » Wed Feb 02, 2011 3:30 am

I don't answer interview questions. The correct answer DOES includes the stage types you insist on excluding.

chulett · Post by **chulett** » Wed Feb 02, 2011 9:27 am

Argh... why in the heck was this set up as a poll? Please don't do that as that's not something I can undo.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Feb 02, 2011 9:51 am

I, however can! Poll deleted...

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Feb 02, 2011 10:04 am

The correct answer is:

1) Sort the data being sent to the transformer on the input link (no separate sort stage required) by all the required keys. Hash Partition on at least the major key to insure if there are duplicates, they will end up in the same partition. If you aren't allowed to partition (STUPID REQUIREMENT) then set the stage to operate in sequential mode instead of parallel mode.
2) Setup an integer stage variable called "svIsDuplicate" and initialize it to 0 (False).
3) Setup stage variables to hold each of your keys initialized to "".
4) Stage variables are processed in order from top to bottom. So first determine if the incoming row's keys all match the keys from the previous record that you are currently storing in the stage variables. If they all match the row is a duplicate so set svIsDuplicate to 1, else set it to 0.
5) Then reset all the saved variables holding your keys to hold the keys for the record you just read.
6) Have two separate identical output links, and add constraints to your output links so that one link gets output when svIsDuplicate is 0 (no duplicate) and another link gets output when svIsDuplicate is 1 (yes - duplicate).

stuartjvnorton · Post by **stuartjvnorton** » Wed Feb 02, 2011 6:03 pm

asorrell wrote:I, however can! Poll deleted...

You're not supposed to delete knucklehead polls: you're supposed to add amusing options to give this otherwise useless interview question thread a little value.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 02, 2011 7:12 pm

Agreed