Page 1 of 1

Partitioning

Posted: Thu Jul 27, 2006 7:25 am
by samba
For Each job, is there any need of partitioning

suppose i have job like this


source ------(partitioning)------> trasnfromation ---(partitioning)---->target

we need to do as i mentioned above or better to keep Auto partitioning
which is the best pratice



Thanks

Posted: Thu Jul 27, 2006 7:29 am
by ray.wurlod
Use (Auto) here. You only need explicit partitioning if rows need to be grouped (for example by key for join/lookup).

Posted: Thu Jul 27, 2006 7:18 pm
by kumar_s
Auto will do, unless you do a key based transformation (eg comparing current and previous records...) in transformer.

Posted: Thu Jul 27, 2006 7:56 pm
by opdas
Hi,
Do we always have to key partition data before a lookup/join as Ray suggested or there any special case when we do this as my join/look up working fine without this, or i'm missing something ? :?:

Posted: Thu Jul 27, 2006 10:12 pm
by kumar_s
Auto partition might have helped you in your case. But dont expect in all the case, if your expect presicse result.

Posted: Fri Jul 28, 2006 1:28 am
by thompsonp
You don't have to partition data before every join or lookup.

In the case of the join you have to ensure that the data with the same keys will be in the same partitions on each side of the join. This usually means partitioning by one or more of the join keys and sorting by all of them. However if the data is already partitioned in this way by a previous stage you can leave it be.

In the case of a lookup you only need partition on the keys if the reference data is not partitioned using Entire. If the reference data is partitioned Entire then it is all available to all nodes on the source input so there is no need to change the partitioning on that input.

The usual guidelines apply - aim to repartition as little as possible and ensure that data is evenly distributed across the nodes.

Posted: Fri Jul 28, 2006 2:24 am
by kumar_s
And it is not must to have entire partition on the reference lookup. Hash partiton on the key for both input and refernce like will do.

Posted: Sun Jul 30, 2006 6:00 am
by dsusr
Partitioning will help in certain aspects but unnnecsessary partitioning degrades the performance of jobs. In addition to what Ray,thomsonp and others have said that you need to do proper partitioning in Remove duplicate and sort stages as well.

In the case of lookup leave the partitioning as auto on both links as DataStage will automatically insert the entire partitinoning in reference node.

Also try to do the partitioning as early as possible and if the joining keys are not changing then just don't do the partinoning again and use the same partition method.

---dsusr