Remove Duplicate using hash

prasson_ibm · Post by **prasson_ibm** » Wed Mar 28, 2012 2:21 am

Hi All,

I have 18 lakhs duplicate records on GRP column and in that 9 distinct records are there.I am using remove duplicate stage and i have explicityl specifited hash on key col and not sort.Its working fine.

My confusion is according to document,Remove duplicate stage need sorted data and hashed partitioned on key column,but in my case its only hased.

I need your input on how it is working.

Thanks

ray.wurlod · Post by **ray.wurlod** » Wed Mar 28, 2012 3:23 am

When your job score was composed, tsort operators would have been inserted. Check the score. It may even be the case that a buffer operator was also inserted.

prasson_ibm · Post by **prasson_ibm** » Wed Mar 28, 2012 3:48 am

Hi Ray,

Yest i checked the job score,and Tsort operator is autometically inserted.

Thanks for your reply.