Page 1 of 1

Remove duplicate

Posted: Thu Jun 10, 2010 2:45 am
by Poovalingam
Hi all,
I'm having a remove duplicate stage for which input link is from Copy Stage. I'm doing hash partition on x, y, z column in the copy stage and Same Partition is used on Remove Duplicate Stage with x, z, y as key columns to remove duplicate.

Should the hash partition column order (xyz) and the Remove duplicate key (xzy) order be exactly identical?

Thanks in advance,
Poova.

Posted: Thu Jun 10, 2010 3:15 am
by sureshreddy2009
Yes

Posted: Thu Jun 10, 2010 4:34 am
by Sainath.Srinivasan
I don't think so.

All it matters is that same keys fall on same partition.

In your case, that will happen immaterial of the key combination - as you include all and only the keys.

Posted: Thu Jun 10, 2010 5:39 am
by ArndW
The order of key definition is important to DataStage, sorting on columns A,B,C and then doing a remove duplicates on keys C,B,A will generate a warning message at runtime.

Posted: Thu Jun 10, 2010 10:48 pm
by Poovalingam
Thanks all. To be in safer side, I will order the key in Remove dup as same as order of hash key .