concern about REMOVE DUPLICATE STAGE..............
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
concern about REMOVE DUPLICATE STAGE..............
Hi all,
In my parallel job I am passing some sorted data to reomove duplicate stage.I have to capture the rejected data(i.e.duplicate data) from this reomove duplicate stage for some further processing.But reomove duplicate stage does not support reject link. So how can I do this?
In my parallel job I am passing some sorted data to reomove duplicate stage.I have to capture the rejected data(i.e.duplicate data) from this reomove duplicate stage for some further processing.But reomove duplicate stage does not support reject link. So how can I do this?
Since you have already sorted the data, you can use the server method of storing a row in stage variables and comparing it with the previous row's value. I'm not at a DS system now, but are you sure that no alternate output is allowed for a remove duplicates?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
Re: concern about REMOVE DUPLICATE STAGE..............
Reject link is not available in Remove Duplicates Stage. You can acheive the same using the Transformer or by using sort stage.
Check the below link.
viewtopic.php?t=102875&highlight=remove+duplicates
Check the below link.
viewtopic.php?t=102875&highlight=remove+duplicates
You can use Sort stage with "Create Key Change Column" and a Filter stage to filter out the values with 0 would be more simple. But you have many option which includes manipulation of the data from its source itself. Database or Ascii file.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
Thnks for the solution... just one more concern...as per the solution we have to do the following
- Create a stage variable called NewID and set as current row ID.
- Evaluate OldID against NewID.
- Create a stage variable called OldID and set as current row ID.
How to set stage Variable NewID with current row ID?
- Create a stage variable called NewID and set as current row ID.
- Evaluate OldID against NewID.
- Create a stage variable called OldID and set as current row ID.
How to set stage Variable NewID with current row ID?
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am
Yeah, you are right, and following post gave the condition to check for duplicates as well.ketanshah123 wrote:Hi Kuamr
just want to make sure....the solution provided you....Using the sort stage..
It assigns 1 to first occurance of key value and duplicate key values with 0 ?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
No reason to add another transformer. In order to use stage variables, usage of transformer is inevitable. Just constraint the output in the same transformer. Hence eliminating the need of a filter stage.kumar_s wrote: Any specific reason, if Transformer can be replace by Filter, it should be good isn't?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 88
- Joined: Wed Apr 05, 2006 1:04 am