Partitioning Method

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sid19
Participant
Posts: 64
Joined: Mon Jun 18, 2007 12:17 am
Location: kolkata

Partitioning Method

Post by sid19 »

I am using Merge Stage, it has one master link and 4 update links, Execution mode is parallel and data volume is around 25 million records in the master link and each update link has also around 5 million records

My Question is : what is the correct method of partitioning in the input links so that we can get the correct result without nconsistency(i.e there will be no data loss ).

In general If I am using Parallel mode of execution in following stages (Lookup, Join, Merge) then what will be the partioning method for
input links(all the above stages have more than one input link) so that we can get the correct result without any inconsistency in result data.

Thank You,
Sid
Sid
Raghavendra
Participant
Posts: 147
Joined: Sat Apr 30, 2005 1:23 am
Location: Bangalore,India

Post by Raghavendra »

I would go with Hash partitioning. :)
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Hash partitioning on join/merge/lookup keys in each input link.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

All inputs identically key partitioned (hash probably, or modulus if integer), identically sorted, and update inputs de-duplicated.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sid19
Participant
Posts: 64
Joined: Mon Jun 18, 2007 12:17 am
Location: kolkata

Post by sid19 »

Thanks for response
Sid
Post Reply