Remove duplicate

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Poovalingam
Participant
Posts: 111
Joined: Mon Nov 30, 2009 7:21 am
Location: Bangalore

Remove duplicate

Post by Poovalingam »

Hi all,
I'm having a remove duplicate stage for which input link is from Copy Stage. I'm doing hash partition on x, y, z column in the copy stage and Same Partition is used on Remove Duplicate Stage with x, z, y as key columns to remove duplicate.

Should the hash partition column order (xyz) and the Remove duplicate key (xzy) order be exactly identical?

Thanks in advance,
Poova.
sureshreddy2009
Participant
Posts: 62
Joined: Sat Mar 07, 2009 4:59 am
Location: Chicago
Contact:

Post by sureshreddy2009 »

Yes
Suresh Reddy
ETL Developer
Research Operations

"its important to know in which direction we are moving rather than where we are"
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

I don't think so.

All it matters is that same keys fall on same partition.

In your case, that will happen immaterial of the key combination - as you include all and only the keys.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The order of key definition is important to DataStage, sorting on columns A,B,C and then doing a remove duplicates on keys C,B,A will generate a warning message at runtime.
Poovalingam
Participant
Posts: 111
Joined: Mon Nov 30, 2009 7:21 am
Location: Bangalore

Post by Poovalingam »

Thanks all. To be in safer side, I will order the key in Remove dup as same as order of hash key .
Post Reply