Partitioning

samba · Post by **samba** » Thu Jul 27, 2006 7:25 am

For Each job, is there any need of partitioning

suppose i have job like this

source ------(partitioning)------> trasnfromation ---(partitioning)---->target

we need to do as i mentioned above or better to keep Auto partitioning
which is the best pratice

Thanks

ray.wurlod · Post by **ray.wurlod** » Thu Jul 27, 2006 7:29 am

Use (Auto) here. You only need explicit partitioning if rows need to be grouped (for example by key for join/lookup).

kumar_s · Post by **kumar_s** » Thu Jul 27, 2006 7:18 pm

Auto will do, unless you do a key based transformation (eg comparing current and previous records...) in transformer.

opdas · Post by **opdas** » Thu Jul 27, 2006 7:56 pm

Hi,
Do we always have to key partition data before a lookup/join as Ray suggested or there any special case when we do this as my join/look up working fine without this, or i'm missing something ?

kumar_s · Post by **kumar_s** » Thu Jul 27, 2006 10:12 pm

Auto partition might have helped you in your case. But dont expect in all the case, if your expect presicse result.

thompsonp · Post by **thompsonp** » Fri Jul 28, 2006 1:28 am

You don't have to partition data before every join or lookup.

In the case of the join you have to ensure that the data with the same keys will be in the same partitions on each side of the join. This usually means partitioning by one or more of the join keys and sorting by all of them. However if the data is already partitioned in this way by a previous stage you can leave it be.

In the case of a lookup you only need partition on the keys if the reference data is not partitioned using Entire. If the reference data is partitioned Entire then it is all available to all nodes on the source input so there is no need to change the partitioning on that input.

The usual guidelines apply - aim to repartition as little as possible and ensure that data is evenly distributed across the nodes.

kumar_s · Post by **kumar_s** » Fri Jul 28, 2006 2:24 am

And it is not must to have entire partition on the reference lookup. Hash partiton on the key for both input and refernce like will do.

dsusr · Post by **dsusr** » Sun Jul 30, 2006 6:00 am

Partitioning will help in certain aspects but unnnecsessary partitioning degrades the performance of jobs. In addition to what Ray,thomsonp and others have said that you need to do proper partitioning in Remove duplicate and sort stages as well.

In the case of lookup leave the partitioning as auto on both links as DataStage will automatically insert the entire partitinoning in reference node.

Also try to do the partitioning as early as possible and if the joining keys are not changing then just don't do the partinoning again and use the same partition method.

---dsusr