Remove duplicate

Poovalingam · Post by **Poovalingam** » Thu Jun 10, 2010 2:45 am

Hi all,
I'm having a remove duplicate stage for which input link is from Copy Stage. I'm doing hash partition on x, y, z column in the copy stage and Same Partition is used on Remove Duplicate Stage with x, z, y as key columns to remove duplicate.

Should the hash partition column order (xyz) and the Remove duplicate key (xzy) order be exactly identical?

Thanks in advance,
Poova.

sureshreddy2009 · Post by **sureshreddy2009** » Thu Jun 10, 2010 3:15 am

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Jun 10, 2010 4:34 am

I don't think so.

All it matters is that same keys fall on same partition.

In your case, that will happen immaterial of the key combination - as you include all and only the keys.

ArndW · Post by **ArndW** » Thu Jun 10, 2010 5:39 am

The order of key definition is important to DataStage, sorting on columns A,B,C and then doing a remove duplicates on keys C,B,A will generate a warning message at runtime.

Poovalingam · Post by **Poovalingam** » Thu Jun 10, 2010 10:48 pm

Thanks all. To be in safer side, I will order the key in Remove dup as same as order of hash key .