Partitioning Method

sid19 · Post by **sid19** » Thu Aug 09, 2007 1:45 am

I am using Merge Stage, it has one master link and 4 update links, Execution mode is parallel and data volume is around 25 million records in the master link and each update link has also around 5 million records

My Question is : what is the correct method of partitioning in the input links so that we can get the correct result without nconsistency(i.e there will be no data loss ).

In general If I am using Parallel mode of execution in following stages (Lookup, Join, Merge) then what will be the partioning method for
input links(all the above stages have more than one input link) so that we can get the correct result without any inconsistency in result data.

Thank You,
Sid

Raghavendra · Post by **Raghavendra** » Thu Aug 09, 2007 1:55 am

I would go with Hash partitioning.

Maveric · Post by **Maveric** » Thu Aug 09, 2007 2:27 am

Hash partitioning on join/merge/lookup keys in each input link.

ray.wurlod · Post by **ray.wurlod** » Thu Aug 09, 2007 5:16 am

All inputs identically key partitioned (hash probably, or modulus if integer), identically sorted, and update inputs de-duplicated.

sid19 · Post by **sid19** » Thu Aug 09, 2007 10:09 pm

Thanks for response