Page 1 of 2

concern about REMOVE DUPLICATE STAGE..............

Posted: Mon Feb 19, 2007 3:20 am
by ketanshah123
Hi all,
In my parallel job I am passing some sorted data to reomove duplicate stage.I have to capture the rejected data(i.e.duplicate data) from this reomove duplicate stage for some further processing.But reomove duplicate stage does not support reject link. So how can I do this?

Posted: Mon Feb 19, 2007 3:50 am
by ArndW
Since you have already sorted the data, you can use the server method of storing a row in stage variables and comparing it with the previous row's value. I'm not at a DS system now, but are you sure that no alternate output is allowed for a remove duplicates?

Posted: Mon Feb 19, 2007 3:56 am
by ketanshah123
Yes I am sure that it does noe allow more than one output link.Gives error source does not support reject link.

Re: concern about REMOVE DUPLICATE STAGE..............

Posted: Mon Feb 19, 2007 3:58 am
by balajisr
Reject link is not available in Remove Duplicates Stage. You can acheive the same using the Transformer or by using sort stage.

Check the below link.
viewtopic.php?t=102875&highlight=remove+duplicates

Posted: Mon Feb 19, 2007 4:16 am
by kumar_s
You can use Sort stage with "Create Key Change Column" and a Filter stage to filter out the values with 0 would be more simple. But you have many option which includes manipulation of the data from its source itself. Database or Ascii file.

Posted: Mon Feb 19, 2007 4:30 am
by ketanshah123
Thnks for the solution... just one more concern...as per the solution we have to do the following

- Create a stage variable called NewID and set as current row ID.
- Evaluate OldID against NewID.
- Create a stage variable called OldID and set as current row ID.

How to set stage Variable NewID with current row ID?

Posted: Mon Feb 19, 2007 4:34 am
by kumar_s
Row ID be your key based on which you identify its duplicate.

Posted: Mon Feb 19, 2007 5:49 am
by ketanshah123
Hi Kuamr
just want to make sure....the solution provided you....Using the sort stage..
It assigns 1 to first occurance of key value and duplicate key values with 0 ?

Posted: Mon Feb 19, 2007 12:25 pm
by swades
Hi,

In Filter Stage, assign keychange=1 in Where clause and map wanted columns to Output link and In option set Output Rejects = True that way you will be having rejects in reject link.

Thanks

Posted: Mon Feb 19, 2007 12:32 pm
by DSguru2B
You can provide the constraint in the transformer itself and avoid the use of Filter stage.

Posted: Mon Feb 19, 2007 5:10 pm
by kumar_s
DSguru2B wrote:You can provide the constraint in the transformer itself and avoid the use of Filter stage.
Any specific reason, if Transformer can be replace by Filter, it should be good isn't?

Posted: Mon Feb 19, 2007 5:10 pm
by kumar_s
ketanshah123 wrote:Hi Kuamr
just want to make sure....the solution provided you....Using the sort stage..
It assigns 1 to first occurance of key value and duplicate key values with 0 ?
Yeah, you are right, and following post gave the condition to check for duplicates as well.

Posted: Mon Feb 19, 2007 8:49 pm
by DSguru2B
kumar_s wrote: Any specific reason, if Transformer can be replace by Filter, it should be good isn't?
No reason to add another transformer. In order to use stage variables, usage of transformer is inevitable. Just constraint the output in the same transformer. Hence eliminating the need of a filter stage.

Posted: Mon Feb 19, 2007 8:52 pm
by kumar_s
Oh.. ok, the discussion is for the next method. The usage of filter is coupled with Sort stage. And hence there is no need to use of stage variable as well.

Posted: Mon Feb 19, 2007 10:23 pm
by ketanshah123
Thx all ppl...problem resolved now... :D