Page 1 of 1

remove duplicates

Posted: Mon Oct 03, 2011 4:50 pm
by harryhome
I have 5 partitions and trying to get distinct records on a key column using remove duplicate stage. getting different number of record count each time I run job.

Re: remove duplicates

Posted: Mon Oct 03, 2011 5:33 pm
by SURA
SORT it!

What do you mean 5 partition?

DS User

Posted: Mon Oct 03, 2011 5:39 pm
by ray.wurlod
Is this the same question as this one?

Posted: Mon Oct 03, 2011 7:12 pm
by prakashdasika
Do you mean 5 nodes ? link Sort the data in hash partion on the key. ascending or descending depends on the preference.