Remove dulpicate

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
eyabmo_rbc
Participant
Posts: 10
Joined: Tue Nov 20, 2007 7:15 am
Location: CANADA

Remove dulpicate

Post by eyabmo_rbc »

Its so funny , how this component act on data ,, am passing data with 4 columns key to this component , and guess what ,, it doesnt catch the duplicate !!! thats funny

example : we have an input stream with keys col1,col2,col3,col4

inside the remove duplicate component m , i do sort based on those keys for the incoming stream , and defined those 4 keys as my uniqueness key ,,

does this component is not composite keys friendly ?

thanks
E.M
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Composite keys are fine. Are you data partitioned, as well as sorted, on these key fields?

Not being partitioned on the keys would seem to manifest as "missing (some) duplicates" if the duplicates were on different partitions as a result, say, of Round Robin partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eyabmo_rbc
Participant
Posts: 10
Joined: Tue Nov 20, 2007 7:15 am
Location: CANADA

Post by eyabmo_rbc »

Hi;

Thanks for the response , Yes i did partioned the data and hash sorted records based on the same key .. i guess now am seeing the data different.
Its corrrect now.
thanks

ray.wurlod wrote:Composite keys are fine. Are you data partitioned, as well as sorted, on these key fields?

Not being partitioned on the keys would seem to manifest as "missing (some) duplicates" if the duplicates ...
E.M
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Auto partiton could lead to RoundRobin partition is any stage and the same could be propagated. And thus the records could have been let on different nodes during duplicate removal process.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

kumar_s wrote:Auto partiton could lead to RoundRobin partition is any stage and the same could be propagated. And thus the records could have been let on different nodes during duplicate removal process.
Does always partition always leads to roundrobin in any stage?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No.

(Auto) leads to Round Robin except:
  • on reference input to Lookup stage - Entire

    on inputs to Join and Merge stages - Hash on join key(s)

    on DB2/UDB Enterprise stages - DB2

    on other parallel to parallel with same degree of parallelism - Same
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eyabmo_rbc
Participant
Posts: 10
Joined: Tue Nov 20, 2007 7:15 am
Location: CANADA

Post by eyabmo_rbc »

SO , do you recommend partioning the data ( HASH ) based on the key , before we sort it , then remove the duplicate ?

ray.wurlod wrote:No.

(Auto) leads to Round Robin except:
  • on reference input to Lookup stage - Entire

    on inputs to Join and Merge stages - Hash on join key(s)

    on DB2/UDB Enterprise stages - DB2

    on ot ...
E.M
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Whether I recommend it or not is irrelevant. It's what you have to do.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply