concern about REMOVE DUPLICATE STAGE..............

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

concern about REMOVE DUPLICATE STAGE..............

Post by ketanshah123 »

Hi all,
In my parallel job I am passing some sorted data to reomove duplicate stage.I have to capture the rejected data(i.e.duplicate data) from this reomove duplicate stage for some further processing.But reomove duplicate stage does not support reject link. So how can I do this?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Since you have already sorted the data, you can use the server method of storing a row in stage variables and comparing it with the previous row's value. I'm not at a DS system now, but are you sure that no alternate output is allowed for a remove duplicates?
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

Yes I am sure that it does noe allow more than one output link.Gives error source does not support reject link.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Re: concern about REMOVE DUPLICATE STAGE..............

Post by balajisr »

Reject link is not available in Remove Duplicates Stage. You can acheive the same using the Transformer or by using sort stage.

Check the below link.
viewtopic.php?t=102875&highlight=remove+duplicates
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

You can use Sort stage with "Create Key Change Column" and a Filter stage to filter out the values with 0 would be more simple. But you have many option which includes manipulation of the data from its source itself. Database or Ascii file.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

Thnks for the solution... just one more concern...as per the solution we have to do the following

- Create a stage variable called NewID and set as current row ID.
- Evaluate OldID against NewID.
- Create a stage variable called OldID and set as current row ID.

How to set stage Variable NewID with current row ID?
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Row ID be your key based on which you identify its duplicate.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

Hi Kuamr
just want to make sure....the solution provided you....Using the sort stage..
It assigns 1 to first occurance of key value and duplicate key values with 0 ?
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

Hi,

In Filter Stage, assign keychange=1 in Where clause and map wanted columns to Output link and In option set Output Rejects = True that way you will be having rejects in reject link.

Thanks
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You can provide the constraint in the transformer itself and avoid the use of Filter stage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

DSguru2B wrote:You can provide the constraint in the transformer itself and avoid the use of Filter stage.
Any specific reason, if Transformer can be replace by Filter, it should be good isn't?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

ketanshah123 wrote:Hi Kuamr
just want to make sure....the solution provided you....Using the sort stage..
It assigns 1 to first occurance of key value and duplicate key values with 0 ?
Yeah, you are right, and following post gave the condition to check for duplicates as well.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

kumar_s wrote: Any specific reason, if Transformer can be replace by Filter, it should be good isn't?
No reason to add another transformer. In order to use stage variables, usage of transformer is inevitable. Just constraint the output in the same transformer. Hence eliminating the need of a filter stage.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Oh.. ok, the discussion is for the next method. The usage of filter is coupled with Sort stage. And hence there is no need to use of stage variable as well.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ketanshah123
Participant
Posts: 88
Joined: Wed Apr 05, 2006 1:04 am

Post by ketanshah123 »

Thx all ppl...problem resolved now... :D
Post Reply