Removing Duplicates using stage variables in parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
karthegx
Participant
Posts: 27
Joined: Wed Sep 06, 2006 1:48 am

Removing Duplicates using stage variables in parallel job

Post by karthegx »

Hi,

I have a source file which contains duplicates in the data. I used this Derivation If Curr <> Prev Then 'Y' Else 'N' in the Stage variable and rejected the duplicate records to the sequential file which i want do it.

I tried the same logic in Parallel job but it's not working.

Why does this doesn,t work in Parllel transformer.
kartheek
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Make sure records are sorted and partitioned by key field for your duplicate checking so that matching rows are adjacent to each other and on the same node in the transformer. Also assign stage variables in right order (Stage variables are processed in order)

svPrev = svCurr
svCurr = input.field
svDupeStatus = If svCurr = svPrev Then 'Y' Else 'N'
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
karthegx
Participant
Posts: 27
Joined: Wed Sep 06, 2006 1:48 am

Post by karthegx »

Hi George,

This logic is working in Server job . But when i am trying in the parllel transformer its not working. Why its not working? Can you tell me about that.

Thanks in Advance
kartheek
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

What is it doing that makes you say it is not working?

Try running it with a one node configuration file and see if it works? If it does it is your sorting and / or partitioning that needs looking at as has already been suggested.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Only the fairy god mother can tell why its not working. We can only guess, which is not good. JoshGeorge gave you a nice explanation. thompsonp added to it. We can only do so much sitting on this side of the screen.
A better way to do this is using the sort stage. Search the forum on the 'How to' part.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Data must be partitioned on the comparison field as noted, and must also be sorted on this field so that comparison values are adjacent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply