Remove Duplicate Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pgupte
Participant
Posts: 7
Joined: Mon Mar 29, 2004 11:32 am

Remove Duplicate Stage

Post by pgupte »

Hi all, I'm removing duplicates, and sorting both my input fields one of which is Nullable.

eg; colA(NotNull) colB ( Nullable)
10 N
10 Y
10 Null

when I sort on both col A and B and say keep nulls last for col B, and remove duplicates based on A, the output is 10 N and 10 Null when I should be getting only 1 record 10 N. Does the sort or remove duplicate stage doesnt work well with Nullable fields ?? I'm on PX7.1

Thanks
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

We use Remove Duplicate stage to eliminate duplicates and what we have done was to transform the null value and submit the result set to RD stage and that would eliminate duplicates in the mentioned key column. So i would suggest you to handle null before submitting the results to RD stage.

HTWH.

Regards
Saravanan
pgupte
Participant
Posts: 7
Joined: Mon Mar 29, 2004 11:32 am

Post by pgupte »

I handled Nulls before the RD and it still wont give me correct results...
:(
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What do you believe are the "correct results"?
What are you getting? Why do you claim these are wrong?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pgupte
Participant
Posts: 7
Joined: Mon Mar 29, 2004 11:32 am

Post by pgupte »

Hi Ray sorry, it was a stupid mistake, I was hashing and sorting on all 2 fields when I should be hashing on field A and Sorting on fields A and B..
Thanks
pgupte
Participant
Posts: 7
Joined: Mon Mar 29, 2004 11:32 am

Post by pgupte »

Saravanan, it had nothing to do with Handling Nulls, thanks for your help.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Thanks for posting your "solution". Sometimes questions like those are valuable in "taking a step backwards" from your design. Another good thing is to have someone else look at it - it's amazing how the problem you couldn't see is blindingly obvious to them! And if they're professional they won't seek to embarrass you with it. :wink:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pgupte
Participant
Posts: 7
Joined: Mon Mar 29, 2004 11:32 am

Post by pgupte »

Ray what u said is true, I used a sort stage before a RD and it worked and when I used inlink sort(RD) it wouldnt work, So I compared the OSH for both jobs and thats whew it was a silly mistake...
Post Reply