Page 1 of 1

Remove Duplicate using hash

Posted: Wed Mar 28, 2012 2:21 am
by prasson_ibm
Hi All,

I have 18 lakhs duplicate records on GRP column and in that 9 distinct records are there.I am using remove duplicate stage and i have explicityl specifited hash on key col and not sort.Its working fine.

My confusion is according to document,Remove duplicate stage need sorted data and hashed partitioned on key column,but in my case its only hased.

I need your input on how it is working.

Thanks

Posted: Wed Mar 28, 2012 3:23 am
by ray.wurlod
When your job score was composed, tsort operators would have been inserted. Check the score. It may even be the case that a buffer operator was also inserted.

Posted: Wed Mar 28, 2012 3:48 am
by prasson_ibm
Hi Ray,

Yest i checked the job score,and Tsort operator is autometically inserted.

Thanks for your reply.