Remove Duplicate Warning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dspxlearn
Premium Member
Premium Member
Posts: 291
Joined: Sat Sep 10, 2005 1:26 am

Remove Duplicate Warning

Post by dspxlearn »

Hi all,


When i am trying to remove duplicates from the iput link of the remove duplicate stage, it is giving an warning as,

Remove_Duplicates_367: When checking operator: User inserted sort "Remove_Duplicates_367.DSLink358_Sort" does not fulfill the sort requirements of the downstream operator "Remove_Duplicates_367"

I was using Hash partitioning and enabled Sorting option as 'Perform Sort'


What might be the possible problem...
Thanks and Regards!!
dspxlearn
bgs
Participant
Posts: 22
Joined: Sat Feb 05, 2005 9:43 pm

Post by bgs »

when you use different keys for sorting and for removing duplicates you get this warning
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
Even the sorting order does the mater!!!
The key used in sort and the Remove Dulicates should match exactly.
Also maintain the same partiton in the Remove Duplicates stage.

-kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The reason that partitioning must be the same is to guarantee that any duplicates occur on the same processing node.

The reason that sorting should occur is so that least memory can be consumed - once the sort key value changes, the stage can be certain that there will be no more matches against this value and can quickly discard any rows from the other input that share this key value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply