Remove Duplicate using hash

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Remove Duplicate using hash

Post by prasson_ibm »

Hi All,

I have 18 lakhs duplicate records on GRP column and in that 9 distinct records are there.I am using remove duplicate stage and i have explicityl specifited hash on key col and not sort.Its working fine.

My confusion is according to document,Remove duplicate stage need sorted data and hashed partitioned on key column,but in my case its only hased.

I need your input on how it is working.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When your job score was composed, tsort operators would have been inserted. Check the score. It may even be the case that a buffer operator was also inserted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi Ray,

Yest i checked the job score,and Tsort operator is autometically inserted.

Thanks for your reply.
Post Reply